You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spot.apache.org by Jason Xunchao Chen <sz...@gmail.com> on 2017/05/25 05:03:30 UTC

Apache Spot data format

​Hi​ there,

I'm deploying Spot and have few questions on data format of the telemetry.
I saw the data table in avro_parquet.hql for DB store, is that the data
fields the raw log data should contain initially?
I have some internal network traffic data, if I understand correctly, I
need to perform ETL on these data to meet the data format Spot can work on,
right?

Thanks!
Jason

Re: Apache Spot data format

Posted by Jason Xunchao Chen <sz...@gmail.com>.
Great! Thanks a lot.

Regards,
Xunchao(Jason) Chen

On Fri, May 26, 2017 at 10:14 AM, Barona, Ricardo <ri...@intel.com>
wrote:

> Ok, Victor Gonzalez helped me with these:
>
>
>
> Treceived = Time the flow was received by the collector
>
> Rip = Router IP Address
>
> The tr- preceding year, month and day are derived columns of Treceived.
>
>
>
> Let me know if this helps.
>
>
>
> Thanks.
>
>
>
>
>
> *From: *"Barona, Ricardo" <ri...@intel.com>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Friday, May 26, 2017 at 11:48 AM
>
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Re: Apache Spot data format
>
>
>
> I already asked around, will get back to you soon.
>
>
>
> *From: *Jason Xunchao Chen <sz...@gmail.com>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Friday, May 26, 2017 at 11:12 AM
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Re: Apache Spot data format
>
>
>
> Hi Ricardo,
>
> Thanks for the reply.
>
>
>
> https://github.com/apache/incubator-spot/blob/master/
> spot-ingest/pipelines/flow/load_flow_avro_parquet.hql
>
>
>
> I figured out most of the column mean. Could I ask what are the first
> "treceived" and last "rip" column mean?  what are "tr-" mean?
>
>
>
> Thanks!
>
>
> Regards,
>
> Xunchao(Jason) Chen
>
>
>
> On Fri, May 26, 2017 at 8:51 AM, Barona, Ricardo <ri...@intel.com>
> wrote:
>
> Hi Jason,
>
>
>
> Yes, the schema contained in the hql scripts is the format Spot is going
> to try to read and yeah, you are correct, a ETL from your source traffic
> data would be a good approach to extract and load into SPOT tables.
>
> Please let me know if you have any issues mapping or if you have question
> regarding any field.
>
>
>
> Thanks!
>
>
>
> *From: *Jason Xunchao Chen <sz...@gmail.com>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Thursday, May 25, 2017 at 12:03 AM
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Apache Spot data format
>
>
>
> ​Hi​ there,
>
>
>
> I'm deploying Spot and have few questions on data format of the telemetry.
>
> I saw the data table in avro_parquet.hql for DB store, is that the data
> fields the raw log data should contain initially?
>
> I have some internal network traffic data, if I understand correctly, I
> need to perform ETL on these data to meet the data format Spot can work on,
> right?
>
>
>
> Thanks!
>
> Jason
>
>
>

Re: Apache Spot data format

Posted by "Barona, Ricardo" <ri...@intel.com>.
Ok, Victor Gonzalez helped me with these:

Treceived = Time the flow was received by the collector
Rip = Router IP Address
The tr- preceding year, month and day are derived columns of Treceived.

Let me know if this helps.

Thanks.


From: "Barona, Ricardo" <ri...@intel.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Friday, May 26, 2017 at 11:48 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: Apache Spot data format

I already asked around, will get back to you soon.

From: Jason Xunchao Chen <sz...@gmail.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Friday, May 26, 2017 at 11:12 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: Apache Spot data format

Hi Ricardo,
Thanks for the reply.

https://github.com/apache/incubator-spot/blob/master/spot-ingest/pipelines/flow/load_flow_avro_parquet.hql

I figured out most of the column mean. Could I ask what are the first "treceived" and last "rip" column mean?  what are "tr-" mean?

Thanks!

Regards,
Xunchao(Jason) Chen

On Fri, May 26, 2017 at 8:51 AM, Barona, Ricardo <ri...@intel.com>> wrote:
Hi Jason,

Yes, the schema contained in the hql scripts is the format Spot is going to try to read and yeah, you are correct, a ETL from your source traffic data would be a good approach to extract and load into SPOT tables.
Please let me know if you have any issues mapping or if you have question regarding any field.

Thanks!

From: Jason Xunchao Chen <sz...@gmail.com>>
Reply-To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Date: Thursday, May 25, 2017 at 12:03 AM
To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Subject: Apache Spot data format

​Hi​ there,

I'm deploying Spot and have few questions on data format of the telemetry.
I saw the data table in avro_parquet.hql for DB store, is that the data fields the raw log data should contain initially?
I have some internal network traffic data, if I understand correctly, I need to perform ETL on these data to meet the data format Spot can work on, right?

Thanks!
Jason


Re: Apache Spot data format

Posted by "Barona, Ricardo" <ri...@intel.com>.
I already asked around, will get back to you soon.

From: Jason Xunchao Chen <sz...@gmail.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Friday, May 26, 2017 at 11:12 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Re: Apache Spot data format

Hi Ricardo,
Thanks for the reply.

https://github.com/apache/incubator-spot/blob/master/spot-ingest/pipelines/flow/load_flow_avro_parquet.hql

I figured out most of the column mean. Could I ask what are the first "treceived" and last "rip" column mean?  what are "tr-" mean?

Thanks!

Regards,
Xunchao(Jason) Chen

On Fri, May 26, 2017 at 8:51 AM, Barona, Ricardo <ri...@intel.com>> wrote:
Hi Jason,

Yes, the schema contained in the hql scripts is the format Spot is going to try to read and yeah, you are correct, a ETL from your source traffic data would be a good approach to extract and load into SPOT tables.
Please let me know if you have any issues mapping or if you have question regarding any field.

Thanks!

From: Jason Xunchao Chen <sz...@gmail.com>>
Reply-To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Date: Thursday, May 25, 2017 at 12:03 AM
To: "user@spot.incubator.apache.org<ma...@spot.incubator.apache.org>" <us...@spot.incubator.apache.org>>
Subject: Apache Spot data format

​Hi​ there,

I'm deploying Spot and have few questions on data format of the telemetry.
I saw the data table in avro_parquet.hql for DB store, is that the data fields the raw log data should contain initially?
I have some internal network traffic data, if I understand correctly, I need to perform ETL on these data to meet the data format Spot can work on, right?

Thanks!
Jason


Re: Apache Spot data format

Posted by Jason Xunchao Chen <sz...@gmail.com>.
Hi Ricardo,
Thanks for the reply.

https://github.com/apache/incubator-spot/blob/master/spot-ingest/pipelines/flow/load_flow_avro_parquet.hql

I figured out most of the column mean. Could I ask what are the first
"treceived" and last "rip" column mean?  what are "tr-" mean?

Thanks!

Regards,
Xunchao(Jason) Chen

On Fri, May 26, 2017 at 8:51 AM, Barona, Ricardo <ri...@intel.com>
wrote:

> Hi Jason,
>
>
>
> Yes, the schema contained in the hql scripts is the format Spot is going
> to try to read and yeah, you are correct, a ETL from your source traffic
> data would be a good approach to extract and load into SPOT tables.
>
> Please let me know if you have any issues mapping or if you have question
> regarding any field.
>
>
>
> Thanks!
>
>
>
> *From: *Jason Xunchao Chen <sz...@gmail.com>
> *Reply-To: *"user@spot.incubator.apache.org" <user@spot.incubator.apache.
> org>
> *Date: *Thursday, May 25, 2017 at 12:03 AM
> *To: *"user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
> *Subject: *Apache Spot data format
>
>
>
> ​Hi​ there,
>
>
>
> I'm deploying Spot and have few questions on data format of the telemetry.
>
> I saw the data table in avro_parquet.hql for DB store, is that the data
> fields the raw log data should contain initially?
>
> I have some internal network traffic data, if I understand correctly, I
> need to perform ETL on these data to meet the data format Spot can work on,
> right?
>
>
>
> Thanks!
>
> Jason
>

Re: Apache Spot data format

Posted by "Barona, Ricardo" <ri...@intel.com>.
Hi Jason,

Yes, the schema contained in the hql scripts is the format Spot is going to try to read and yeah, you are correct, a ETL from your source traffic data would be a good approach to extract and load into SPOT tables.
Please let me know if you have any issues mapping or if you have question regarding any field.

Thanks!

From: Jason Xunchao Chen <sz...@gmail.com>
Reply-To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Date: Thursday, May 25, 2017 at 12:03 AM
To: "user@spot.incubator.apache.org" <us...@spot.incubator.apache.org>
Subject: Apache Spot data format

​Hi​ there,

I'm deploying Spot and have few questions on data format of the telemetry.
I saw the data table in avro_parquet.hql for DB store, is that the data fields the raw log data should contain initially?
I have some internal network traffic data, if I understand correctly, I need to perform ETL on these data to meet the data format Spot can work on, right?

Thanks!
Jason