You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by Karan Gupta <ka...@tavant.com> on 2018/05/04 06:00:01 UTC

No Index Formation in Elastic Search

Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.

Re:RE: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,


That's great it works for you.
Actually for using ES, as I know, we don't have to create indices first before we insert any data. ES could generate the index and schema when inserting the first value. 
In our environment, we didn't create any index or mapping schema in ES, it could work as well. I don't know why it fails in your environment, which version of ES are you using?


This is why we didn't have any document for ES index. Maybe for different version of ES it performs different, if so, we'll have more investigation and fix it.


--

Regards,
Lionel, Liu



At 2018-05-11 17:42:22, "Karan Gupta" <ka...@tavant.com> wrote:
>Hi Lionel,
>
>Thank you for your quick revert.
>
>I recreated the ES index as you suggested.
>I no more see any errors on Griffin console as I used to see earlier.
>But I don’t see any documents on the ES index either…
>The Jobs are running and completing though and HDFS is having the latest job run metrics.
>
>Any suggestions here?
>
>Env.json has "method": "post" for ES persist part.
>Should it be POST?
>
>Thanks,
>Best,
>Karan
>From: Lionel Liu <li...@apache.org>
>Sent: Friday, May 11, 2018 3:45 PM
>To: Karan Gupta <ka...@tavant.com>
>Cc: dev@griffin.incubator.apache.org
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>I've double checked my environment, sorry for the last reply, I pasted the old version one.
>In the current version, the metric does like this:
>{
>          "name" : "accu_job",
>          "tmst" : 1524812400000,
>          "value" : {
>            "total" : 125000,
>            "miss" : 505,
>            "matched" : 124495
>          }
>}
>
>I curl for mapping schema by this command:
>curl -XGET '<ES IP>:9200/_mapping?pretty=true'
>
>And get the schema like this:
>{
>  "griffin" : {
>    "mappings" : {
>      "accuracy" : {
>        "properties" : {
>          "name" : {
>            "type" : "text",
>            "fields" : {
>              "keyword" : {
>                "type" : "keyword",
>                "ignore_above" : 256
>              }
>            }
>          },
>          "tmst" : {
>            "type" : "long"
>          },
>          "value" : {
>            "properties" : {
>              "matched" : {
>                "type" : "long"
>              },
>              "miss" : {
>                "type" : "long"
>              },
>              "total" : {
>                "type" : "long"
>              }
>            }
>          }
>        }
>      }
>    }
>  }
>}
>
>It's a bit different with the metrics persisted on hdfs, "name" equals "metricName", "tmst" equals "timestamp", and the "value" fields are exactly the same.
>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}
>
>For the details you can refer to:
>https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HdfsPersist.scala#L334<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0>
>https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala#L110<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0>
>
>There might be some modification in the later version, to refactor the metrics schema, and will also be highlighted in release notes.
>
>
>Hope this helps you.
>
>Thanks,
>Lionel
>
>On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com>> wrote:
>Hi,
>
>Following is a sample JSON that is stored in HDFS by Griffin.
>It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS
>
>There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.
>
>Sample JSON:
>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}
>
>This does not match the “schema” that you have outlined below.
>
>Are we using an older version of Griffin? Can you help me with some clarity?
>
>Thanks,
>Best,
>Karan
>From: Lionel Liu <li...@apache.org>>
>Sent: Wednesday, May 9, 2018 11:36 AM
>To: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>; Karan Gupta <ka...@tavant.com>>
>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>Sorry for the missing field "__tmst", which is the timestamp with each output value record.
>The mappings schema should be:
>
>{
>  "mappings": {
>    "accuracy": {
>      "properties": {
>        "name" : {"type": "keyword"},
>        "tmst" : {"type": "long"},
>        "value" : {
>          "properties": {
>            "__tmst": {"type": "long"},
>            "total": {"type": "long"},
>            "miss": {"type": "long"},
>            "matched": {"type": "long"}
>          }
>        }
>      }
>    }
>  }
>}
>
>Thanks,
>Lionel
>
>On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>> wrote:
>Hi Lionel,
>
>I tried the below CURL which you sent me
>
>curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'
>
>When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.
>
>Am I missing something here?
>
>Thank you,
>Karan Gupta
>
>
>
>From: Lionel Liu <li...@apache.org>>
>Sent: Friday, May 4, 2018 6:29 PM
>To: Karan Gupta <ka...@tavant.com>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>For accuracy, you can try mappings like this:
>
>curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
>  "mappings": {
>    "accuracy": {
>      "properties": {
>            "name" : {"type": "keyword"},
>            "tmst" : {"type": "long"},
>            "value" : {
>              "properties": {
>                             "total": {"type": "long"},
>                             "miss": {"type": "long"},
>                             "matched": {"type": "long"}
>              }
>            }
>                }
>              }
>  }
>}'
>
>The metric schema is like this:
>
>{
>
>       "name": "accuracy",
>
>       "tmst":1525320600000
>
>       "value": {
>
>              "total": 100000,
>
>              "miss": 200,
>
>              "matched": 99800
>
>       }
>
>}
>
>
>
>For profiling, you may need another mappings.
>
>In our wiki, you can get the metric schema here:
>https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>
>
>As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.
>
>
>Thanks,
>Lionel
>
>
>On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi Lionel,
>
>We are not using Docker Image, hence we want to set it up manually.
>Could you provide us the “CREATE” statement for griffin indices along with “mappings”.
>
>
>Thank you,
>Karan Gupta
>
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 2:56 PM
>
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
>That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.
>
>ES will generate the indices by the first value you post to it.
>
>Thanks,
>Lionel
>
>On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>HI Lionel,
>
>The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.
>
>We created an index for Griffin but we were not sure about what mappings we should use.
>Until we created this, we never got this index auto-created in ES…..
>And now that we have created the index, there are errors which are suggestive of missing “mappings”
>
>Is there an auto index create property that we need to enable somewhere in ES?
>I could not find anything in the config yml file though….
>
>Thank you,
>Karan Gupta
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 2:11 PM
>
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
>[Answer] The metrics are persisted directly from spark application.
>
>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
>[Answer] I think you can modify "localhost" to the ip address of ES.
>
>But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
>[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.
>
>For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.
>
>BTW, has the metrics been persisted on HDFS?
>
>Thanks,
>Lionel
>
>
>
>On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi,
>
>Thank you for the detail.
>
>In env.json, we have specified both HDFS and HTTP.
>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
>But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
>
>One more:
>
>Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
>They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
>This was causing Spark jobs not to launch.
>We edited the env.json accordingly…. We hope we did the right thing…
>Can you confirm this?
>
>Thank you,
>Karan Gupta
>
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 11:46 AM
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
>- "log": print the metrics in application log.
>- "hdfs": the metrics will be persisted in hdfs path you've set.
>- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.
>
>You can choose multiple of them.
>If "http" is not configured correctly, post metrics to ES fails.
>If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
>If "log" is configured, you can get the application log from yarn:
>    yarn logs -applicationId <appId> > applog
>Then read the applog, find if there's any output metric calculated.
>If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.
>
>Thanks,
>Lionel
>On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi Lionel,
>
>While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
>Could you help me out with a possible solution?
>
>
>Thank you,
>Karan Gupta
>________________________________
>Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.
>
>
>
>

RE: Re:RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
//Sorry for Double Post. Just wanted to register this in this thread so that this is useful for someone.

Hi Lionel,

We were finally able to populate the Griffin DQ Metrics and view the same on the Griffin Console.

Here is the final adjustment I made in the HttpPersist.scala -> def httpResult

Patching -> val header = Map[String, Object]("Content-Type"->"application/json")

Reference -> curl -X POST "http://<HOST ES::9200>/griffin/accuracy/<http://%3cHOST%20ES::9200%3e/griffin/accuracy/>" -H "Content-Type:application/json" -d @<FILENAME>

The above patching worked and now I am able to see the Griffin DQ Metrics.


We are using Spark 1.6.2 in our HDP cluster.. May be, could that be the reason?
The scalaj.http libraries that we use, probably are older… may be…….Not sure..

Anyway, All is well that ends well 😊

A Big THANKS to @Lionel Liu<ma...@apache.org> for all his help!!!!

Thank you,
Karan Gupta


From: bhlx3lyx7@163.com <bh...@163.com> On Behalf Of Lionel Liu
Sent: Saturday, May 12, 2018 4:38 PM
To: Karan Gupta <ka...@tavant.com>; dev@griffin.incubator.apache.org
Subject: Re:RE: No Index Formation in Elastic Search

Hi Karan,

In Griffin service module, we do have such method "addMetricValues", it only works for the "publish measure", which helps the users to just publish the existed metrics into the same ES, and griffin could also render it on UI.
In Griffin measure module, which is the spark application to calculate the metrics, it persists metrics into ES through http post request, as implemented here: https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala&data=01%7C01%7Ckaran.gupta%40tavant.com%7C96aa6b12fac94271de7c08d5b7f898b8%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tcB1nXPkjf%2BwThOLppkwkLfbwCfqZqi7GwCpE%2F6au%2BY%3D&reserved=0>, this is the way of persisting metrics into ES.

--
Regards,
Lionel, Liu


At 2018-05-11 18:16:27, "Karan Gupta" <ka...@tavant.com>> wrote:

>Hi Lionel,

>

>I searched the source to find where the metrics are stored in ES.

>We found that "addMetricValues" implementation in metricStoreImpl.java is the one that uses "bulk" API of ES to update the metrics in ES.

>

>I added a few print statements out there to examine what is being written

>But my code was not getting called.

>I also see that nowhere else in Griffin source this function is being called.

>But In Griffin startup, I see that this method is tied to "web API" through Spring... See msg below...

>

>Is the expectation that some1 will call the Web-API to persist data into ES from HDFS ?

>

>2018-05-11 05:54:02.613  INFO 125701 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/v1/metrics/values],methods=[POST]}" onto public org.springframework.http.ResponseEntity org.apache.griffin.core.metric.MetricController.addMetricValues(java.util.List<org.apache.griffin.core.metric.model.MetricValue>)

>

>Thanks,

>Best,

>Karan

>-----Original Message-----

>From: Karan Gupta <ka...@tavant.com>>

>Sent: Friday, May 11, 2018 4:12 PM

>To: Lionel Liu <li...@apache.org>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>

>Subject: RE: No Index Formation in Elastic Search

>

>Hi Lionel,

>

>Thank you for your quick revert.

>

>I recreated the ES index as you suggested.

>I no more see any errors on Griffin console as I used to see earlier.

>But I don’t see any documents on the ES index either… The Jobs are running and completing though and HDFS is having the latest job run metrics.

>

>Any suggestions here?

>

>Env.json has "method": "post" for ES persist part.

>Should it be POST?

>

>Thanks,

>Best,

>Karan

>From: Lionel Liu <li...@apache.org>>

>Sent: Friday, May 11, 2018 3:45 PM

>To: Karan Gupta <ka...@tavant.com>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>I've double checked my environment, sorry for the last reply, I pasted the old version one.

>In the current version, the metric does like this:

>{

>          "name" : "accu_job",

>          "tmst" : 1524812400000,

>          "value" : {

>            "total" : 125000,

>            "miss" : 505,

>            "matched" : 124495

>          }

>}

>

>I curl for mapping schema by this command:

>curl -XGET '<ES IP>:9200/_mapping?pretty=true'

>

>And get the schema like this:

>{

>  "griffin" : {

>    "mappings" : {

>      "accuracy" : {

>        "properties" : {

>          "name" : {

>            "type" : "text",

>            "fields" : {

>              "keyword" : {

>                "type" : "keyword",

>                "ignore_above" : 256

>              }

>            }

>          },

>          "tmst" : {

>            "type" : "long"

>          },

>          "value" : {

>            "properties" : {

>              "matched" : {

>                "type" : "long"

>              },

>              "miss" : {

>                "type" : "long"

>              },

>              "total" : {

>                "type" : "long"

>              }

>            }

>          }

>        }

>      }

>    }

>  }

>}

>

>It's a bit different with the metrics persisted on hdfs, "name" equals "metricName", "tmst" equals "timestamp", and the "value" fields are exactly the same.

>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

>

>For the details you can refer to:

>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=lfMazDuaVxpYg4JaiEiHlOL28CTgiYCSdIBz3gXrNtw%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=lfMazDuaVxpYg4JaiEiHlOL28CTgiYCSdIBz3gXrNtw%3D&reserved=0%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0>>

>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=PYHgFduLJqU8SdkoJfY7KLiN1no8k4L48yAyT4myo%2Bc%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=PYHgFduLJqU8SdkoJfY7KLiN1no8k4L48yAyT4myo%2Bc%3D&reserved=0%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0>>

>

>There might be some modification in the later version, to refactor the metrics schema, and will also be highlighted in release notes.

>

>

>Hope this helps you.

>

>Thanks,

>Lionel

>

>On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com>>> wrote:

>Hi,

>

>Following is a sample JSON that is stored in HDFS by Griffin.

>It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS

>

>There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.

>

>Sample JSON:

>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

>

>This does not match the “schema” that you have outlined below.

>

>Are we using an older version of Griffin? Can you help me with some clarity?

>

>Thanks,

>Best,

>Karan

>From: Lionel Liu <li...@apache.org>>>

>Sent: Wednesday, May 9, 2018 11:36 AM

>To: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>; Karan Gupta <ka...@tavant.com>>>

>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>Sorry for the missing field "__tmst", which is the timestamp with each output value record.

>The mappings schema should be:

>

>{

>  "mappings": {

>    "accuracy": {

>      "properties": {

>        "name" : {"type": "keyword"},

>        "tmst" : {"type": "long"},

>        "value" : {

>          "properties": {

>            "__tmst": {"type": "long"},

>            "total": {"type": "long"},

>            "miss": {"type": "long"},

>            "matched": {"type": "long"}

>          }

>        }

>      }

>    }

>  }

>}

>

>Thanks,

>Lionel

>

>On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>>> wrote:

>Hi Lionel,

>

>I tried the below CURL which you sent me

>

>curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'

>

>When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.

>

>Am I missing something here?

>

>Thank you,

>Karan Gupta

>

>

>

>From: Lionel Liu <li...@apache.org>>>

>Sent: Friday, May 4, 2018 6:29 PM

>To: Karan Gupta <ka...@tavant.com>>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>For accuracy, you can try mappings like this:

>

>curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{

>  "mappings": {

>    "accuracy": {

>      "properties": {

>            "name" : {"type": "keyword"},

>            "tmst" : {"type": "long"},

>            "value" : {

>              "properties": {

>                             "total": {"type": "long"},

>                             "miss": {"type": "long"},

>                             "matched": {"type": "long"}

>              }

>            }

>                }

>              }

>  }

>}'

>

>The metric schema is like this:

>

>{

>

>       "name": "accuracy",

>

>       "tmst":1525320600000

>

>       "value": {

>

>              "total": 100000,

>

>              "miss": 200,

>

>              "matched": 99800

>

>       }

>

>}

>

>

>

>For profiling, you may need another mappings.

>

>In our wiki, you can get the metric schema here:

>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=vWcQMS70rYcTDz8UsHv0DrYwH%2FhbR7G5CnYBRBw0MqE%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=vWcQMS70rYcTDz8UsHv0DrYwH%2FhbR7G5CnYBRBw0MqE%3D&reserved=0%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0%3e%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>>

>

>As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.

>

>

>Thanks,

>Lionel

>

>

>On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>>> wrote:

>Hi Lionel,

>

>We are not using Docker Image, hence we want to set it up manually.

>Could you provide us the “CREATE” statement for griffin indices along with “mappings”.

>

>

>Thank you,

>Karan Gupta

>

>From: Lionel Liu <li...@apache.org>>>>

>Sent: Friday, May 4, 2018 2:56 PM

>

>To: Karan Gupta <ka...@tavant.com>>>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Tfab27U4jDpPeT8X3fdrPKriSsYOj39bqxZpIYyjDR4%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Tfab27U4jDpPeT8X3fdrPKriSsYOj39bqxZpIYyjDR4%3D&reserved=0%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0%3e%3chttps://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>>

>That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

>

>ES will generate the indices by the first value you post to it.

>

>Thanks,

>Lionel

>

>On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>>> wrote:

>HI Lionel,

>

>The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

>

>We created an index for Griffin but we were not sure about what mappings we should use.

>Until we created this, we never got this index auto-created in ES…..

>And now that we have created the index, there are errors which are suggestive of missing “mappings”

>

>Is there an auto index create property that we need to enable somewhere in ES?

>I could not find anything in the config yml file though….

>

>Thank you,

>Karan Gupta

>From: Lionel Liu <li...@apache.org>>>>

>Sent: Friday, May 4, 2018 2:11 PM

>

>To: Karan Gupta <ka...@tavant.com>>>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?

>[Answer] The metrics are persisted directly from spark application.

>

>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) [Answer] I think you can modify "localhost" to the ip address of ES.

>

>But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

>[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

>

>For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

>

>BTW, has the metrics been persisted on HDFS?

>

>Thanks,

>Lionel

>

>

>

>On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>>> wrote:

>Hi,

>

>Thank you for the detail.

>

>In env.json, we have specified both HDFS and HTTP.

>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?

>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

>

>One more:

>

>Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.

>They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List… This was causing Spark jobs not to launch.

>We edited the env.json accordingly…. We hope we did the right thing… Can you confirm this?

>

>Thank you,

>Karan Gupta

>

>From: Lionel Liu <li...@apache.org>>>>

>Sent: Friday, May 4, 2018 11:46 AM

>To: Karan Gupta <ka...@tavant.com>>>>

>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>>

>Subject: Re: No Index Formation in Elastic Search

>

>Hi Karan,

>

>First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?

>- "log": print the metrics in application log.

>- "hdfs": the metrics will be persisted in hdfs path you've set.

>- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

>

>You can choose multiple of them.

>If "http" is not configured correctly, post metrics to ES fails.

>If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.

>If "log" is configured, you can get the application log from yarn:

>    yarn logs -applicationId <appId> > applog Then read the applog, find if there's any output metric calculated.

>If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

>

>Thanks,

>Lionel

>On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>>> wrote:

>Hi Lionel,

>

>While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.

>Could you help me out with a possible solution?

>

>

>Thank you,

>Karan Gupta

>________________________________

>Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.

>

>

>

>




Re:RE: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,


In Griffin service module, we do have such method "addMetricValues", it only works for the "publish measure", which helps the users to just publish the existed metrics into the same ES, and griffin could also render it on UI.
In Griffin measure module, which is the spark application to calculate the metrics, it persists metrics into ES through http post request, as implemented here: https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala, this is the way of persisting metrics into ES.


--

Regards,
Lionel, Liu



At 2018-05-11 18:16:27, "Karan Gupta" <ka...@tavant.com> wrote:
>Hi Lionel,
>
>I searched the source to find where the metrics are stored in ES.
>We found that "addMetricValues" implementation in metricStoreImpl.java is the one that uses "bulk" API of ES to update the metrics in ES.
>
>I added a few print statements out there to examine what is being written 
>But my code was not getting called.
>I also see that nowhere else in Griffin source this function is being called.
>But In Griffin startup, I see that this method is tied to "web API" through Spring... See msg below...
>
>Is the expectation that some1 will call the Web-API to persist data into ES from HDFS ?
>
>2018-05-11 05:54:02.613  INFO 125701 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/v1/metrics/values],methods=[POST]}" onto public org.springframework.http.ResponseEntity org.apache.griffin.core.metric.MetricController.addMetricValues(java.util.List<org.apache.griffin.core.metric.model.MetricValue>)
>
>Thanks,
>Best,
>Karan
>-----Original Message-----
>From: Karan Gupta <ka...@tavant.com> 
>Sent: Friday, May 11, 2018 4:12 PM
>To: Lionel Liu <li...@apache.org>
>Cc: dev@griffin.incubator.apache.org
>Subject: RE: No Index Formation in Elastic Search
>
>Hi Lionel,
>
>Thank you for your quick revert.
>
>I recreated the ES index as you suggested.
>I no more see any errors on Griffin console as I used to see earlier.
>But I don’t see any documents on the ES index either… The Jobs are running and completing though and HDFS is having the latest job run metrics.
>
>Any suggestions here?
>
>Env.json has "method": "post" for ES persist part.
>Should it be POST?
>
>Thanks,
>Best,
>Karan
>From: Lionel Liu <li...@apache.org>
>Sent: Friday, May 11, 2018 3:45 PM
>To: Karan Gupta <ka...@tavant.com>
>Cc: dev@griffin.incubator.apache.org
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>I've double checked my environment, sorry for the last reply, I pasted the old version one.
>In the current version, the metric does like this:
>{
>          "name" : "accu_job",
>          "tmst" : 1524812400000,
>          "value" : {
>            "total" : 125000,
>            "miss" : 505,
>            "matched" : 124495
>          }
>}
>
>I curl for mapping schema by this command:
>curl -XGET '<ES IP>:9200/_mapping?pretty=true'
>
>And get the schema like this:
>{
>  "griffin" : {
>    "mappings" : {
>      "accuracy" : {
>        "properties" : {
>          "name" : {
>            "type" : "text",
>            "fields" : {
>              "keyword" : {
>                "type" : "keyword",
>                "ignore_above" : 256
>              }
>            }
>          },
>          "tmst" : {
>            "type" : "long"
>          },
>          "value" : {
>            "properties" : {
>              "matched" : {
>                "type" : "long"
>              },
>              "miss" : {
>                "type" : "long"
>              },
>              "total" : {
>                "type" : "long"
>              }
>            }
>          }
>        }
>      }
>    }
>  }
>}
>
>It's a bit different with the metrics persisted on hdfs, "name" equals "metricName", "tmst" equals "timestamp", and the "value" fields are exactly the same.
>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}
>
>For the details you can refer to:
>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=lfMazDuaVxpYg4JaiEiHlOL28CTgiYCSdIBz3gXrNtw%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0>
>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=PYHgFduLJqU8SdkoJfY7KLiN1no8k4L48yAyT4myo%2Bc%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0>
>
>There might be some modification in the later version, to refactor the metrics schema, and will also be highlighted in release notes.
>
>
>Hope this helps you.
>
>Thanks,
>Lionel
>
>On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com>> wrote:
>Hi,
>
>Following is a sample JSON that is stored in HDFS by Griffin.
>It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS
>
>There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.
>
>Sample JSON:
>{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}
>
>This does not match the “schema” that you have outlined below.
>
>Are we using an older version of Griffin? Can you help me with some clarity?
>
>Thanks,
>Best,
>Karan
>From: Lionel Liu <li...@apache.org>>
>Sent: Wednesday, May 9, 2018 11:36 AM
>To: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>; Karan Gupta <ka...@tavant.com>>
>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>Sorry for the missing field "__tmst", which is the timestamp with each output value record.
>The mappings schema should be:
>
>{
>  "mappings": {
>    "accuracy": {
>      "properties": {
>        "name" : {"type": "keyword"},
>        "tmst" : {"type": "long"},
>        "value" : {
>          "properties": {
>            "__tmst": {"type": "long"},
>            "total": {"type": "long"},
>            "miss": {"type": "long"},
>            "matched": {"type": "long"}
>          }
>        }
>      }
>    }
>  }
>}
>
>Thanks,
>Lionel
>
>On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>> wrote:
>Hi Lionel,
>
>I tried the below CURL which you sent me
>
>curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'
>
>When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.
>
>Am I missing something here?
>
>Thank you,
>Karan Gupta
>
>
>
>From: Lionel Liu <li...@apache.org>>
>Sent: Friday, May 4, 2018 6:29 PM
>To: Karan Gupta <ka...@tavant.com>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>For accuracy, you can try mappings like this:
>
>curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
>  "mappings": {
>    "accuracy": {
>      "properties": {
>            "name" : {"type": "keyword"},
>            "tmst" : {"type": "long"},
>            "value" : {
>              "properties": {
>                             "total": {"type": "long"},
>                             "miss": {"type": "long"},
>                             "matched": {"type": "long"}
>              }
>            }
>                }
>              }
>  }
>}'
>
>The metric schema is like this:
>
>{
>
>       "name": "accuracy",
>
>       "tmst":1525320600000
>
>       "value": {
>
>              "total": 100000,
>
>              "miss": 200,
>
>              "matched": 99800
>
>       }
>
>}
>
>
>
>For profiling, you may need another mappings.
>
>In our wiki, you can get the metric schema here:
>https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=vWcQMS70rYcTDz8UsHv0DrYwH%2FhbR7G5CnYBRBw0MqE%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>
>
>As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.
>
>
>Thanks,
>Lionel
>
>
>On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi Lionel,
>
>We are not using Docker Image, hence we want to set it up manually.
>Could you provide us the “CREATE” statement for griffin indices along with “mappings”.
>
>
>Thank you,
>Karan Gupta
>
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 2:56 PM
>
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Tfab27U4jDpPeT8X3fdrPKriSsYOj39bqxZpIYyjDR4%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
>That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.
>
>ES will generate the indices by the first value you post to it.
>
>Thanks,
>Lionel
>
>On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>HI Lionel,
>
>The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.
>
>We created an index for Griffin but we were not sure about what mappings we should use.
>Until we created this, we never got this index auto-created in ES…..
>And now that we have created the index, there are errors which are suggestive of missing “mappings”
>
>Is there an auto index create property that we need to enable somewhere in ES?
>I could not find anything in the config yml file though….
>
>Thank you,
>Karan Gupta
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 2:11 PM
>
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
>[Answer] The metrics are persisted directly from spark application.
>
>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) [Answer] I think you can modify "localhost" to the ip address of ES.
>
>But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
>[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.
>
>For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.
>
>BTW, has the metrics been persisted on HDFS?
>
>Thanks,
>Lionel
>
>
>
>On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi,
>
>Thank you for the detail.
>
>In env.json, we have specified both HDFS and HTTP.
>For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
>Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
>
>One more:
>
>Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
>They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List… This was causing Spark jobs not to launch.
>We edited the env.json accordingly…. We hope we did the right thing… Can you confirm this?
>
>Thank you,
>Karan Gupta
>
>From: Lionel Liu <li...@apache.org>>>
>Sent: Friday, May 4, 2018 11:46 AM
>To: Karan Gupta <ka...@tavant.com>>>
>Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
>- "log": print the metrics in application log.
>- "hdfs": the metrics will be persisted in hdfs path you've set.
>- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.
>
>You can choose multiple of them.
>If "http" is not configured correctly, post metrics to ES fails.
>If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
>If "log" is configured, you can get the application log from yarn:
>    yarn logs -applicationId <appId> > applog Then read the applog, find if there's any output metric calculated.
>If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.
>
>Thanks,
>Lionel
>On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>> wrote:
>Hi Lionel,
>
>While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
>Could you help me out with a possible solution?
>
>
>Thank you,
>Karan Gupta
>________________________________
>Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.
>
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi Lionel,

I searched the source to find where the metrics are stored in ES.
We found that "addMetricValues" implementation in metricStoreImpl.java is the one that uses "bulk" API of ES to update the metrics in ES.

I added a few print statements out there to examine what is being written 
But my code was not getting called.
I also see that nowhere else in Griffin source this function is being called.
But In Griffin startup, I see that this method is tied to "web API" through Spring... See msg below...

Is the expectation that some1 will call the Web-API to persist data into ES from HDFS ?

2018-05-11 05:54:02.613  INFO 125701 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/api/v1/metrics/values],methods=[POST]}" onto public org.springframework.http.ResponseEntity org.apache.griffin.core.metric.MetricController.addMetricValues(java.util.List<org.apache.griffin.core.metric.model.MetricValue>)

Thanks,
Best,
Karan
-----Original Message-----
From: Karan Gupta <ka...@tavant.com> 
Sent: Friday, May 11, 2018 4:12 PM
To: Lionel Liu <li...@apache.org>
Cc: dev@griffin.incubator.apache.org
Subject: RE: No Index Formation in Elastic Search

Hi Lionel,

Thank you for your quick revert.

I recreated the ES index as you suggested.
I no more see any errors on Griffin console as I used to see earlier.
But I don’t see any documents on the ES index either… The Jobs are running and completing though and HDFS is having the latest job run metrics.

Any suggestions here?

Env.json has "method": "post" for ES persist part.
Should it be POST?

Thanks,
Best,
Karan
From: Lionel Liu <li...@apache.org>
Sent: Friday, May 11, 2018 3:45 PM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

I've double checked my environment, sorry for the last reply, I pasted the old version one.
In the current version, the metric does like this:
{
          "name" : "accu_job",
          "tmst" : 1524812400000,
          "value" : {
            "total" : 125000,
            "miss" : 505,
            "matched" : 124495
          }
}

I curl for mapping schema by this command:
curl -XGET '<ES IP>:9200/_mapping?pretty=true'

And get the schema like this:
{
  "griffin" : {
    "mappings" : {
      "accuracy" : {
        "properties" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "tmst" : {
            "type" : "long"
          },
          "value" : {
            "properties" : {
              "matched" : {
                "type" : "long"
              },
              "miss" : {
                "type" : "long"
              },
              "total" : {
                "type" : "long"
              }
            }
          }
        }
      }
    }
  }
}

It's a bit different with the metrics persisted on hdfs, "name" equals "metricName", "tmst" equals "timestamp", and the "value" fields are exactly the same.
{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

For the details you can refer to:
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=lfMazDuaVxpYg4JaiEiHlOL28CTgiYCSdIBz3gXrNtw%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0>
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=PYHgFduLJqU8SdkoJfY7KLiN1no8k4L48yAyT4myo%2Bc%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0>

There might be some modification in the later version, to refactor the metrics schema, and will also be highlighted in release notes.


Hope this helps you.

Thanks,
Lionel

On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi,

Following is a sample JSON that is stored in HDFS by Griffin.
It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS

There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.

Sample JSON:
{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

This does not match the “schema” that you have outlined below.

Are we using an older version of Griffin? Can you help me with some clarity?

Thanks,
Best,
Karan
From: Lionel Liu <li...@apache.org>>
Sent: Wednesday, May 9, 2018 11:36 AM
To: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>; Karan Gupta <ka...@tavant.com>>

Subject: Re: No Index Formation in Elastic Search

Hi Karan,

Sorry for the missing field "__tmst", which is the timestamp with each output value record.
The mappings schema should be:

{
  "mappings": {
    "accuracy": {
      "properties": {
        "name" : {"type": "keyword"},
        "tmst" : {"type": "long"},
        "value" : {
          "properties": {
            "__tmst": {"type": "long"},
            "total": {"type": "long"},
            "miss": {"type": "long"},
            "matched": {"type": "long"}
          }
        }
      }
    }
  }
}

Thanks,
Lionel

On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

I tried the below CURL which you sent me

curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'

When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.

Am I missing something here?

Thank you,
Karan Gupta



From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 6:29 PM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For accuracy, you can try mappings like this:

curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
  "mappings": {
    "accuracy": {
      "properties": {
            "name" : {"type": "keyword"},
            "tmst" : {"type": "long"},
            "value" : {
              "properties": {
                             "total": {"type": "long"},
                             "miss": {"type": "long"},
                             "matched": {"type": "long"}
              }
            }
                }
              }
  }
}'

The metric schema is like this:

{

       "name": "accuracy",

       "tmst":1525320600000

       "value": {

              "total": 100000,

              "miss": 200,

              "matched": 99800

       }

}



For profiling, you may need another mappings.

In our wiki, you can get the metric schema here:
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=vWcQMS70rYcTDz8UsHv0DrYwH%2FhbR7G5CnYBRBw0MqE%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>

As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.


Thanks,
Lionel


On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with “mappings”.


Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:56 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ceeebd3c76cdf4da5e15a08d5b72bee0d%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Tfab27U4jDpPeT8X3fdrPKriSsYOj39bqxZpIYyjDR4%3D&reserved=0<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>> wrote:
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:11 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) [Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host) But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List… This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing… Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel
On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.





RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi Lionel,

Thank you for your quick revert.

I recreated the ES index as you suggested.
I no more see any errors on Griffin console as I used to see earlier.
But I don’t see any documents on the ES index either…
The Jobs are running and completing though and HDFS is having the latest job run metrics.

Any suggestions here?

Env.json has "method": "post" for ES persist part.
Should it be POST?

Thanks,
Best,
Karan
From: Lionel Liu <li...@apache.org>
Sent: Friday, May 11, 2018 3:45 PM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

I've double checked my environment, sorry for the last reply, I pasted the old version one.
In the current version, the metric does like this:
{
          "name" : "accu_job",
          "tmst" : 1524812400000,
          "value" : {
            "total" : 125000,
            "miss" : 505,
            "matched" : 124495
          }
}

I curl for mapping schema by this command:
curl -XGET '<ES IP>:9200/_mapping?pretty=true'

And get the schema like this:
{
  "griffin" : {
    "mappings" : {
      "accuracy" : {
        "properties" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "tmst" : {
            "type" : "long"
          },
          "value" : {
            "properties" : {
              "matched" : {
                "type" : "long"
              },
              "miss" : {
                "type" : "long"
              },
              "total" : {
                "type" : "long"
              }
            }
          }
        }
      }
    }
  }
}

It's a bit different with the metrics persisted on hdfs, "name" equals "metricName", "tmst" equals "timestamp", and the "value" fields are exactly the same.
{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

For the details you can refer to:
https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HdfsPersist.scala#L334<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D&reserved=0>
https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala#L110<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D&reserved=0>

There might be some modification in the later version, to refactor the metrics schema, and will also be highlighted in release notes.


Hope this helps you.

Thanks,
Lionel

On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi,

Following is a sample JSON that is stored in HDFS by Griffin.
It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS

There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.

Sample JSON:
{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

This does not match the “schema” that you have outlined below.

Are we using an older version of Griffin? Can you help me with some clarity?

Thanks,
Best,
Karan
From: Lionel Liu <li...@apache.org>>
Sent: Wednesday, May 9, 2018 11:36 AM
To: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>; Karan Gupta <ka...@tavant.com>>

Subject: Re: No Index Formation in Elastic Search

Hi Karan,

Sorry for the missing field "__tmst", which is the timestamp with each output value record.
The mappings schema should be:

{
  "mappings": {
    "accuracy": {
      "properties": {
        "name" : {"type": "keyword"},
        "tmst" : {"type": "long"},
        "value" : {
          "properties": {
            "__tmst": {"type": "long"},
            "total": {"type": "long"},
            "miss": {"type": "long"},
            "matched": {"type": "long"}
          }
        }
      }
    }
  }
}

Thanks,
Lionel

On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

I tried the below CURL which you sent me

curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'

When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.

Am I missing something here?

Thank you,
Karan Gupta



From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 6:29 PM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For accuracy, you can try mappings like this:

curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
  "mappings": {
    "accuracy": {
      "properties": {
            "name" : {"type": "keyword"},
            "tmst" : {"type": "long"},
            "value" : {
              "properties": {
                             "total": {"type": "long"},
                             "miss": {"type": "long"},
                             "matched": {"type": "long"}
              }
            }
                }
              }
  }
}'

The metric schema is like this:

{

       "name": "accuracy",

       "tmst":1525320600000

       "value": {

              "total": 100000,

              "miss": 200,

              "matched": 99800

       }

}



For profiling, you may need another mappings.

In our wiki, you can get the metric schema here:
https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>

As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.


Thanks,
Lionel


On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with “mappings”.


Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:56 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>> wrote:
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:11 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel
On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.





Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

I've double checked my environment, sorry for the last reply, I pasted the
old version one.
In the current version, the metric does like this:
{
          "name" : "accu_job",
          "tmst" : 1524812400000,
          "value" : {
            "total" : 125000,
            "miss" : 505,
            "matched" : 124495
          }
}

I curl for mapping schema by this command:
curl -XGET '<ES IP>:9200/_mapping?pretty=true'

And get the schema like this:
{
  "griffin" : {
    "mappings" : {
      "accuracy" : {
        "properties" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "tmst" : {
            "type" : "long"
          },
          "value" : {
            "properties" : {
              "matched" : {
                "type" : "long"
              },
              "miss" : {
                "type" : "long"
              },
              "total" : {
                "type" : "long"
              }
            }
          }
        }
      }
    }
  }
}

It's a bit different with the metrics persisted on hdfs, "name" equals
"metricName", "tmst" equals "timestamp", and the "value" fields are exactly
the same.

{"metricName":"job_names","timestamp":1525804920000,"
value":{"total":19,"miss":2,"matched":17}}


For the details you can refer to:

https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HdfsPersist.scala#L334

https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala#L110


There might be some modification in the later version, to refactor the
metrics schema, and will also be highlighted in release notes.


Hope this helps you.

Thanks,
Lionel

On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <ka...@tavant.com> wrote:

> Hi,
>
>
>
> Following is a sample JSON that is stored in HDFS by Griffin.
>
> It resides in : hdfs:///griffin/streaming/persist/job_names/
> 1525804920000/_METRICS
>
>
>
> There are also _LOG, _START, __missRecords files created for each Job. I
> assume they are not meant for storage in ES.
>
>
>
> Sample JSON:
>
> {"metricName":"job_names","timestamp":1525804920000,"
> value":{"total":19,"miss":2,"matched":17}}
>
>
>
> This does not match the “schema” that you have outlined below.
>
>
>
> Are we using an older version of Griffin? Can you help me with some
> clarity?
>
>
>
> Thanks,
>
> Best,
>
> Karan
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Wednesday, May 9, 2018 11:36 AM
> *To:* dev@griffin.incubator.apache.org; Karan Gupta <
> karan.gupta@tavant.com>
>
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> Sorry for the missing field "__tmst", which is the timestamp with each
> output value record.
>
> The mappings schema should be:
>
>
>
> {
>   "mappings": {
>     "accuracy": {
>       "properties": {
>
>         "name" : {"type": "keyword"},
>
>         "tmst" : {"type": "long"},
>         "value" : {
>           "properties": {
>
>             "__tmst": {"type": "long"},
>
>             "total": {"type": "long"},
>             "miss": {"type": "long"},
>             "matched": {"type": "long"}
>           }
>         }
>       }
>     }
>   }
> }
>
>
>
> Thanks,
>
> Lionel
>
>
>
> On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi Lionel,
>
> I tried the below CURL which you sent me
>
> curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type:
> application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" :
> {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties":
> {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type":
> "long"}}}}}}}'
>
> When I try to GET the indexes, I can see that griffin index has been
> created in the elastic search. Then I ran the service jar again but I could
> not see DQ Metric getting populated.
>
> Am I missing something here?
>
> Thank you,
> Karan Gupta
>
>
>
> From: Lionel Liu <li...@apache.org>
> Sent: Friday, May 4, 2018 6:29 PM
> To: Karan Gupta <ka...@tavant.com>
> Cc: dev@griffin.incubator.apache.org
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For accuracy, you can try mappings like this:
>
> curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
>   "mappings": {
>     "accuracy": {
>       "properties": {
>             "name" : {"type": "keyword"},
>             "tmst" : {"type": "long"},
>             "value" : {
>               "properties": {
>                              "total": {"type": "long"},
>                              "miss": {"type": "long"},
>                              "matched": {"type": "long"}
>               }
>             }
>                 }
>               }
>   }
> }'
>
> The metric schema is like this:
>
> {
>
>        "name": "accuracy",
>
>        "tmst":1525320600000
>
>        "value": {
>
>               "total": 100000,
>
>               "miss": 200,
>
>               "matched": 99800
>
>        }
>
> }
>
>
>
> For profiling, you may need another mappings.
>
> In our wiki, you can get the metric schema here:
> https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0>
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.
> apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%
> 2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%
> 7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206e
> fbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0
> >
>
> As I know, ES doesn't need to create indices manually, it will create the
> mappings by the first value posted. That's what we do in our docker image,
> and it works.
>
>
> Thanks,
> Lionel
>
>
> On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi Lionel,
>
> We are not using Docker Image, hence we want to set it up manually.
> Could you provide us the “CREATE” statement for griffin indices along with
> “mappings”.
>
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 2:56 PM
>
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> In our docker image, we only configured 'http.cors.enabled: true' and
> 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile:
> https://github.com/bhlx3lyx7/griffin-docker/blob/master/
> elasticsearch/Dockerfile
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0>
> <https://apac01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%
> 2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%
> 7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206e
> fbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&
> reserved=0>
> That's all the things we've done for ES configuration, without any other
> initialization. And when the spark application post metrics to ES directly,
> it succeed.
>
> ES will generate the indices by the first value you post to it.
>
> Thanks,
> Lionel
>
> On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> HI Lionel,
>
> The metrics is being persisted in HDFS… This is good progress for us.
> Thank you for all your valuable help.
>
> We created an index for Griffin but we were not sure about what mappings
> we should use.
> Until we created this, we never got this index auto-created in ES…..
> And now that we have created the index, there are errors which are
> suggestive of missing “mappings”
>
> Is there an auto index create property that we need to enable somewhere in
> ES?
> I could not find anything in the config yml file though….
>
> Thank you,
> Karan Gupta
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 2:11 PM
>
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> [Answer] The metrics are persisted directly from spark application.
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> [Answer] I think you can modify "localhost" to the ip address of ES.
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
> [Answer] You don't need to create the indices in ES, ES will create it
> when post metrics to it.
>
> For the "email" and "sms" parameters, they are not enabled in this
> version, you can just ignore them in env.json.
>
> BTW, has the metrics been persisted on HDFS?
>
> Thanks,
> Lionel
>
>
>
> On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi,
>
> Thank you for the detail.
>
> In env.json, we have specified both HDFS and HTTP.
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
> One more:
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
> This was causing Spark jobs not to launch.
> We edited the env.json accordingly…. We hope we did the right thing…
> Can you confirm this?
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 11:46 AM
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
> - "log": print the metrics in application log.
> - "hdfs": the metrics will be persisted in hdfs path you've set.
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
> You can choose multiple of them.
> If "http" is not configured correctly, post metrics to ES fails.
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
> If "log" is configured, you can get the application log from yarn:
>     yarn logs -applicationId <appId> > applog
> Then read the applog, find if there's any output metric calculated.
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
> Thanks,
> Lionel
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi Lionel,
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
> Could you help me out with a possible solution?
>
>
> Thank you,
> Karan Gupta
> ________________________________
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi,

Following is a sample JSON that is stored in HDFS by Griffin.
It resides in : hdfs:///griffin/streaming/persist/job_names/1525804920000/_METRICS

There are also _LOG, _START, __missRecords files created for each Job. I assume they are not meant for storage in ES.

Sample JSON:
{"metricName":"job_names","timestamp":1525804920000,"value":{"total":19,"miss":2,"matched":17}}

This does not match the “schema” that you have outlined below.

Are we using an older version of Griffin? Can you help me with some clarity?

Thanks,
Best,
Karan
From: Lionel Liu <li...@apache.org>
Sent: Wednesday, May 9, 2018 11:36 AM
To: dev@griffin.incubator.apache.org; Karan Gupta <ka...@tavant.com>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

Sorry for the missing field "__tmst", which is the timestamp with each output value record.
The mappings schema should be:

{
  "mappings": {
    "accuracy": {
      "properties": {
        "name" : {"type": "keyword"},
        "tmst" : {"type": "long"},
        "value" : {
          "properties": {
            "__tmst": {"type": "long"},
            "total": {"type": "long"},
            "miss": {"type": "long"},
            "matched": {"type": "long"}
          }
        }
      }
    }
  }
}

Thanks,
Lionel

On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

I tried the below CURL which you sent me

curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'

When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.

Am I missing something here?

Thank you,
Karan Gupta



From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 6:29 PM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For accuracy, you can try mappings like this:

curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
  "mappings": {
    "accuracy": {
      "properties": {
            "name" : {"type": "keyword"},
            "tmst" : {"type": "long"},
            "value" : {
              "properties": {
                             "total": {"type": "long"},
                             "miss": {"type": "long"},
                             "matched": {"type": "long"}
              }
            }
                }
              }
  }
}'

The metric schema is like this:

{

       "name": "accuracy",

       "tmst":1525320600000

       "value": {

              "total": 100000,

              "miss": 200,

              "matched": 99800

       }

}



For profiling, you may need another mappings.

In our wiki, you can get the metric schema here:
https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tYTlOzDMKU89%2BCigCFxKVk2AK5fg3%2B%2FolzuqnYXcS8o%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>

As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.


Thanks,
Lionel


On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with “mappings”.


Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:56 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C18cc221a4ad84a2f6ce908d5b57301e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=qGH0gjNWPvKBST7E3i5%2BO5L8iD4riKyfZUqVPYYu4BM%3D&reserved=0><https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>>> wrote:
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 2:11 PM

To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel

On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.





Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

Sorry for the missing field "__tmst", which is the timestamp with each
output value record.
The mappings schema should be:

{
  "mappings": {
    "accuracy": {
      "properties": {
        "name" : {"type": "keyword"},
        "tmst" : {"type": "long"},
        "value" : {
          "properties": {
            "__tmst": {"type": "long"},
            "total": {"type": "long"},
            "miss": {"type": "long"},
            "matched": {"type": "long"}
          }
        }
      }
    }
  }
}

Thanks,
Lionel

On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <ka...@tavant.com> wrote:

> Hi Lionel,
>
> I tried the below CURL which you sent me
>
> curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type:
> application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" :
> {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties":
> {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type":
> "long"}}}}}}}'
>
> When I try to GET the indexes, I can see that griffin index has been
> created in the elastic search. Then I ran the service jar again but I could
> not see DQ Metric getting populated.
>
> Am I missing something here?
>
> Thank you,
> Karan Gupta
>
>
>
> From: Lionel Liu <li...@apache.org>
> Sent: Friday, May 4, 2018 6:29 PM
> To: Karan Gupta <ka...@tavant.com>
> Cc: dev@griffin.incubator.apache.org
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For accuracy, you can try mappings like this:
>
> curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
>   "mappings": {
>     "accuracy": {
>       "properties": {
>             "name" : {"type": "keyword"},
>             "tmst" : {"type": "long"},
>             "value" : {
>               "properties": {
>                              "total": {"type": "long"},
>                              "miss": {"type": "long"},
>                              "matched": {"type": "long"}
>               }
>             }
>                 }
>               }
>   }
> }'
>
> The metric schema is like this:
>
> {
>
>        "name": "accuracy",
>
>        "tmst":1525320600000
>
>        "value": {
>
>               "total": 100000,
>
>               "miss": 200,
>
>               "matched": 99800
>
>        }
>
> }
>
>
>
> For profiling, you may need another mappings.
>
> In our wiki, you can get the metric schema here:
> https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.
> apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%
> 2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%
> 7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206e
> fbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0
> >
>
> As I know, ES doesn't need to create indices manually, it will create the
> mappings by the first value posted. That's what we do in our docker image,
> and it works.
>
>
> Thanks,
> Lionel
>
>
> On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi Lionel,
>
> We are not using Docker Image, hence we want to set it up manually.
> Could you provide us the “CREATE” statement for griffin indices along with
> “mappings”.
>
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 2:56 PM
>
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> In our docker image, we only configured 'http.cors.enabled: true' and
> 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile:
> https://github.com/bhlx3lyx7/griffin-docker/blob/master/
> elasticsearch/Dockerfile<https://apac01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> 2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&
> data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a1
> 06e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=
> YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
> That's all the things we've done for ES configuration, without any other
> initialization. And when the spark application post metrics to ES directly,
> it succeed.
>
> ES will generate the indices by the first value you post to it.
>
> Thanks,
> Lionel
>
> On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> HI Lionel,
>
> The metrics is being persisted in HDFS… This is good progress for us.
> Thank you for all your valuable help.
>
> We created an index for Griffin but we were not sure about what mappings
> we should use.
> Until we created this, we never got this index auto-created in ES…..
> And now that we have created the index, there are errors which are
> suggestive of missing “mappings”
>
> Is there an auto index create property that we need to enable somewhere in
> ES?
> I could not find anything in the config yml file though….
>
> Thank you,
> Karan Gupta
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 2:11 PM
>
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> [Answer] The metrics are persisted directly from spark application.
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> [Answer] I think you can modify "localhost" to the ip address of ES.
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
> [Answer] You don't need to create the indices in ES, ES will create it
> when post metrics to it.
>
> For the "email" and "sms" parameters, they are not enabled in this
> version, you can just ignore them in env.json.
>
> BTW, has the metrics been persisted on HDFS?
>
> Thanks,
> Lionel
>
>
>
> On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi,
>
> Thank you for the detail.
>
> In env.json, we have specified both HDFS and HTTP.
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
> One more:
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
> This was causing Spark jobs not to launch.
> We edited the env.json accordingly…. We hope we did the right thing…
> Can you confirm this?
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <li...@apache.org>>
> Sent: Friday, May 4, 2018 11:46 AM
> To: Karan Gupta <ka...@tavant.com>>
> Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
> - "log": print the metrics in application log.
> - "hdfs": the metrics will be persisted in hdfs path you've set.
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
> You can choose multiple of them.
> If "http" is not configured correctly, post metrics to ES fails.
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
> If "log" is configured, you can get the application log from yarn:
>     yarn logs -applicationId <appId> > applog
> Then read the applog, find if there's any output metric calculated.
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
> Thanks,
> Lionel
>
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <karan.gupta@tavant.com<
> mailto:karan.gupta@tavant.com>> wrote:
> Hi Lionel,
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
> Could you help me out with a possible solution?
>
>
> Thank you,
> Karan Gupta
> ________________________________
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi Lionel,

I tried the below CURL which you sent me

curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type: application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" : {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties": {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type": "long"}}}}}}}'

When I try to GET the indexes, I can see that griffin index has been created in the elastic search. Then I ran the service jar again but I could not see DQ Metric getting populated.

Am I missing something here?

Thank you,
Karan Gupta



From: Lionel Liu <li...@apache.org>
Sent: Friday, May 4, 2018 6:29 PM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For accuracy, you can try mappings like this:

curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
  "mappings": {
    "accuracy": {
      "properties": {
            "name" : {"type": "keyword"},
            "tmst" : {"type": "long"},
            "value" : {
              "properties": {
                             "total": {"type": "long"},
                             "miss": {"type": "long"},
                             "matched": {"type": "long"}
              }
            }
                }
              }
  }
}'

The metric schema is like this:

{

       "name": "accuracy",

       "tmst":1525320600000

       "value": {

              "total": 100000,

              "miss": 200,

              "matched": 99800

       }

}



For profiling, you may need another mappings.

In our wiki, you can get the metric schema here:
https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>

As I know, ES doesn't need to create indices manually, it will create the mappings by the first value posted. That's what we do in our docker image, and it works.


Thanks,
Lionel


On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with “mappings”.


Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 2:56 PM

To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>> wrote:
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 2:11 PM

To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel


On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.





Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

For accuracy, you can try mappings like this:

curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d  '{
  "mappings": {
    "accuracy": {
      "properties": {
            "name" : {"type": "keyword"},
            "tmst" : {"type": "long"},
            "value" : {
            "properties": {
            "total": {"type": "long"},
            "miss": {"type": "long"},
            "matched": {"type": "long"}
            }
            }
    }
}
  }
}'

The metric schema is like this:

{
	"name": "accuracy",
	"tmst":1525320600000
	"value": {
		"total": 100000,
		"miss": 200,
		"matched": 99800
	}
}



For profiling, you may need another mappings.

In our wiki, you can get the metric schema here:
https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema

As I know, ES doesn't need to create indices manually, it will create the
mappings by the first value posted. That's what we do in our docker image,
and it works.


Thanks,
Lionel


On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <ka...@tavant.com> wrote:

> Hi Lionel,
>
>
>
> We are not using Docker Image, hence we want to set it up manually.
> Could you provide us the “CREATE” statement for griffin indices along with
> “mappings”.
>
>
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 2:56 PM
>
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> In our docker image, we only configured 'http.cors.enabled: true' and
> 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile:
> https://github.com/bhlx3lyx7/griffin-docker/blob/master/
> elasticsearch/Dockerfile
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
>
> That's all the things we've done for ES configuration, without any other
> initialization. And when the spark application post metrics to ES directly,
> it succeed.
>
>
>
> ES will generate the indices by the first value you post to it.
>
>
>
> Thanks,
>
> Lionel
>
>
>
> On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> HI Lionel,
>
>
>
> The metrics is being persisted in HDFS… This is good progress for us.
> Thank you for all your valuable help.
>
>
>
> We created an index for Griffin but we were not sure about what mappings
> we should use.
>
> Until we created this, we never got this index auto-created in ES…..
>
> And now that we have created the index, there are errors which are
> suggestive of missing “mappings”
>
>
>
> Is there an auto index create property that we need to enable somewhere in
> ES?
>
> I could not find anything in the config yml file though….
>
>
>
> Thank you,
>
> Karan Gupta
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 2:11 PM
>
>
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
>
> [Answer] The metrics are persisted directly from spark application.
>
>
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
>
> [Answer] I think you can modify "localhost" to the ip address of ES.
>
>
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
> [Answer] You don't need to create the indices in ES, ES will create it
> when post metrics to it.
>
>
>
> For the "email" and "sms" parameters, they are not enabled in this
> version, you can just ignore them in env.json.
>
>
>
> BTW, has the metrics been persisted on HDFS?
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
>
>
> On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi,
>
>
>
> Thank you for the detail.
>
>
>
> In env.json, we have specified both HDFS and HTTP.
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
>
>
> One more:
>
>
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
>
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
>
> This was causing Spark jobs not to launch.
>
> We edited the env.json accordingly…. We hope we did the right thing…
>
> Can you confirm this?
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 11:46 AM
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
>
> - "log": print the metrics in application log.
>
> - "hdfs": the metrics will be persisted in hdfs path you've set.
>
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
>
>
> You can choose multiple of them.
>
> If "http" is not configured correctly, post metrics to ES fails.
>
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
>
> If "log" is configured, you can get the application log from yarn:
>
>     yarn logs -applicationId <appId> > applog
>
> Then read the applog, find if there's any output metric calculated.
>
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi Lionel,
>
>
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
>
> Could you help me out with a possible solution?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
>
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi Lionel,

We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with “mappings”.


Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>
Sent: Friday, May 4, 2018 2:56 PM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile: https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other initialization. And when the spark application post metrics to ES directly, it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com>> wrote:
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 2:11 PM

To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel


On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.




Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

In our docker image, we only configured 'http.cors.enabled: true' and
'http.cors.allow-origin:
"*"' in elasticsearch.yml, as the Dockerfile:
https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile
That's all the things we've done for ES configuration, without any other
initialization. And when the spark application post metrics to ES directly,
it succeed.

ES will generate the indices by the first value you post to it.

Thanks,
Lionel

On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <ka...@tavant.com> wrote:

> HI Lionel,
>
>
>
> The metrics is being persisted in HDFS… This is good progress for us.
> Thank you for all your valuable help.
>
>
>
> We created an index for Griffin but we were not sure about what mappings
> we should use.
>
> Until we created this, we never got this index auto-created in ES…..
>
> And now that we have created the index, there are errors which are
> suggestive of missing “mappings”
>
>
>
> Is there an auto index create property that we need to enable somewhere in
> ES?
>
> I could not find anything in the config yml file though….
>
>
>
> Thank you,
>
> Karan Gupta
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 2:11 PM
>
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
>
> [Answer] The metrics are persisted directly from spark application.
>
>
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
>
> [Answer] I think you can modify "localhost" to the ip address of ES.
>
>
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
> [Answer] You don't need to create the indices in ES, ES will create it
> when post metrics to it.
>
>
>
> For the "email" and "sms" parameters, they are not enabled in this
> version, you can just ignore them in env.json.
>
>
>
> BTW, has the metrics been persisted on HDFS?
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
>
>
> On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi,
>
>
>
> Thank you for the detail.
>
>
>
> In env.json, we have specified both HDFS and HTTP.
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
>
>
> One more:
>
>
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
>
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
>
> This was causing Spark jobs not to launch.
>
> We edited the env.json accordingly…. We hope we did the right thing…
>
> Can you confirm this?
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 11:46 AM
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
>
> - "log": print the metrics in application log.
>
> - "hdfs": the metrics will be persisted in hdfs path you've set.
>
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
>
>
> You can choose multiple of them.
>
> If "http" is not configured correctly, post metrics to ES fails.
>
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
>
> If "log" is configured, you can get the application log from yarn:
>
>     yarn logs -applicationId <appId> > applog
>
> Then read the applog, find if there's any output metric calculated.
>
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi Lionel,
>
>
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
>
> Could you help me out with a possible solution?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
>
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
HI Lionel,

The metrics is being persisted in HDFS… This is good progress for us. Thank you for all your valuable help.

We created an index for Griffin but we were not sure about what mappings we should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive of missing “mappings”

Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….

Thank you,
Karan Gupta
From: Lionel Liu <li...@apache.org>
Sent: Friday, May 4, 2018 2:11 PM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.

But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version, you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>>
Cc: dev@griffin.incubator.apache.org<ma...@griffin.incubator.apache.org>
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel


On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.



Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

For HTTP persistence, are the metrics persisted directly from “Spark”? (or)
Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.

Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from
Griffin service, it will work…. But from Spark executors, it wont work as
localhost resolves to executor host)

[Answer] I think you can modify "localhost" to the ip address of ES.


But we have not created any index in “ES” called “griffin” or “accuracy”….?
What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when
post metrics to it.

For the "email" and "sms" parameters, they are not enabled in this version,
you can just ignore them in env.json.

BTW, has the metrics been persisted on HDFS?

Thanks,
Lionel



On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <ka...@tavant.com> wrote:

> Hi,
>
>
>
> Thank you for the detail.
>
>
>
> In env.json, we have specified both HDFS and HTTP.
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
>
>
> One more:
>
>
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
>
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
>
> This was causing Spark jobs not to launch.
>
> We edited the env.json accordingly…. We hope we did the right thing…
>
> Can you confirm this?
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel Liu <li...@apache.org>
> *Sent:* Friday, May 4, 2018 11:46 AM
> *To:* Karan Gupta <ka...@tavant.com>
> *Cc:* dev@griffin.incubator.apache.org
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
>
> - "log": print the metrics in application log.
>
> - "hdfs": the metrics will be persisted in hdfs path you've set.
>
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
>
>
> You can choose multiple of them.
>
> If "http" is not configured correctly, post metrics to ES fails.
>
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
>
> If "log" is configured, you can get the application log from yarn:
>
>     yarn logs -applicationId <appId> > applog
>
> Then read the applog, find if there's any output metric calculated.
>
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>
> wrote:
>
> Hi Lionel,
>
>
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
>
> Could you help me out with a possible solution?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
>
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>

RE: No Index Formation in Elastic Search

Posted by Karan Gupta <ka...@tavant.com>.
Hi,

Thank you for the detail.

In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or) Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from Griffin service, it will work…. But from Spark executors, it wont work as localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….? What should we be doing here?

One more:

Yesterday we found that “email” and “sms” parts of the env.json are not configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?

Thank you,
Karan Gupta

From: Lionel Liu <li...@apache.org>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <ka...@tavant.com>
Cc: dev@griffin.incubator.apache.org
Subject: Re: No Index Formation in Elastic Search

Hi Karan,

First, we need to check has griffin successfully finished. What persist types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the "path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you need to read the applog, and find the error message. Then you can show it to me, I'll help you find it.

Thanks,
Lionel


On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com>> wrote:
Hi Lionel,

While the Spark Application gets finished, I do not see any Index getting created in the elastic search, hence I do not see the data quality metrics getting populated.
Could you help me out with a possible solution?


Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.


Re: No Index Formation in Elastic Search

Posted by Lionel Liu <li...@apache.org>.
Hi Karan,

First, we need to check has griffin successfully finished. What persist
types did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the
elasticsearch endpoint by default.

You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the
"path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
    yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration,
you need to read the applog, and find the error message. Then you can show
it to me, I'll help you find it.

Thanks,
Lionel


On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <ka...@tavant.com> wrote:

> Hi Lionel,
>
>
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
>
> Could you help me out with a possible solution?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>