You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Christian Schneider <cs...@gmail.com> on 2013/03/13 17:51:14 UTC

Access and archive Job Tracker logs

Hi,
for billing purposes we like to access the job tracker logs by an API (the
kind of information we see on the jobtracker web interface).

Is there a tool or library for that?

It would be also good if those logs are archived for later usage (or kept
in a database).

As requirements we need smth like:
* all jobs by year, month
* all jobs by year, month, user
* search by jobname
....

Thank you :)

Best Regards,
Christian.

Re: Access and archive Job Tracker logs

Posted by Christian Schneider <cs...@gmail.com>.
I digged a little bit deeper and found an interesting "*mapreduce/history"
folder on the JobTracker.

Could you help me how to use these files?


I need to archive them and search them later (by date or job-name).


michaela 18:29:31 /var/log/hadoop-0.20-mapreduce/history # tree
.
├── done
│   ├── michaela.ixcloud.net_1361182032467_
│   │   └── 2013
│   │       └── 01
│   │           └── 18
│   ├── michaela.ixcloud.net_1361195731054_
│   │   └── 2013
│   │       └── 01
│   │           └── 19
│   ├── michaela.ixcloud.net_1363254634813_
│   │   └── 2013
│   │       └── 03
│   │           └── 14
│   │               └── 000000
│   │                   ├──
job_201303141050_0001_1363254715201_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0002_1363254735631_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0003_1363254752878_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0004_1363254766964_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0005_1363254779720_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0006_1363254851856_panshul_PigLatin%3Agametypecounter
│   │                   ├──
job_201303141050_0007_1363255612659_lealem_gameDataTimeSeriesJob
│   │                   ├──
job_201303141050_0010_1363263699635_lealem_gameDataTimeSeriesJob
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0001_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0002_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0003_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0004_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0005_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0006_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0007_conf.xml
│   │                   └──
michaela.ixcloud.net_1363254634813_job_201303141050_0010_conf.xml
│   └── michaela.ixcloud.net_1363269263476_
│       └── 2013
│           └── 03
│               └── 14
│                   └── 000000
│                       ├──
job_201303141454_0001_1363269701945_root_importtsv_timeSeries
│                       ├──
job_201303141454_0002_1363269757097_hdfs_importtsv_timeSeries
│                       ├──
job_201303141454_0003_1363271517403_christian_rep120.case3i
│                       ├──
job_201303141454_0005_1363280494548_lealem_gameDataTimeSeriesJob
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0001_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0002_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0003_conf.xml
│                       └──
michaela.ixcloud.net_1363269263476_job_201303141454_0005_conf.xml
├──
job_201303061254_0010_1362574214308_tim_SELECT+*+FROM+temp_panshul_table+WHERE+...10%28Stage
├── job_201303061458_0015_1362582264552_tim_hivetest42
├── job_201303061737_0006_1362588850460_christian_rankingJob
├── job_201303141454_0006_1363281646038_lars+job+2009
├── michaela.ixcloud.net_1362570865029_job_201303061254_0010_conf.xml
├── michaela.ixcloud.net_1362578315853_job_201303061458_0015_conf.xml
├── michaela.ixcloud.net_1362587825736_job_201303061737_0006_conf.xml
└── michaela.ixcloud.net_1363269263476_job_201303141454_0006_conf.xml

I found some information [1] and [2] but those are from 2009-10.
And with the hadoop job -history [3] command i am also not able to read
them (because some _logs/history) folder is missing.

Best Regards,
Christian.

[1]
http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
[2] http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/
[3] http://archive.cloudera.com/cdh4/cdh/4/mr1/cluster_setup.html#Logging


2013/3/13 Christian Schneider <cs...@gmail.com>

> Hi,
> for billing purposes we like to access the job tracker logs by an API (the
> kind of information we see on the jobtracker web interface).
>
> Is there a tool or library for that?
>
> It would be also good if those logs are archived for later usage (or kept
> in a database).
>
> As requirements we need smth like:
> * all jobs by year, month
> * all jobs by year, month, user
> * search by jobname
> ....
>
> Thank you :)
>
> Best Regards,
> Christian.
>
>

Re: Access and archive Job Tracker logs

Posted by Christian Schneider <cs...@gmail.com>.
I digged a little bit deeper and found an interesting "*mapreduce/history"
folder on the JobTracker.

Could you help me how to use these files?


I need to archive them and search them later (by date or job-name).


michaela 18:29:31 /var/log/hadoop-0.20-mapreduce/history # tree
.
├── done
│   ├── michaela.ixcloud.net_1361182032467_
│   │   └── 2013
│   │       └── 01
│   │           └── 18
│   ├── michaela.ixcloud.net_1361195731054_
│   │   └── 2013
│   │       └── 01
│   │           └── 19
│   ├── michaela.ixcloud.net_1363254634813_
│   │   └── 2013
│   │       └── 03
│   │           └── 14
│   │               └── 000000
│   │                   ├──
job_201303141050_0001_1363254715201_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0002_1363254735631_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0003_1363254752878_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0004_1363254766964_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0005_1363254779720_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0006_1363254851856_panshul_PigLatin%3Agametypecounter
│   │                   ├──
job_201303141050_0007_1363255612659_lealem_gameDataTimeSeriesJob
│   │                   ├──
job_201303141050_0010_1363263699635_lealem_gameDataTimeSeriesJob
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0001_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0002_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0003_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0004_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0005_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0006_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0007_conf.xml
│   │                   └──
michaela.ixcloud.net_1363254634813_job_201303141050_0010_conf.xml
│   └── michaela.ixcloud.net_1363269263476_
│       └── 2013
│           └── 03
│               └── 14
│                   └── 000000
│                       ├──
job_201303141454_0001_1363269701945_root_importtsv_timeSeries
│                       ├──
job_201303141454_0002_1363269757097_hdfs_importtsv_timeSeries
│                       ├──
job_201303141454_0003_1363271517403_christian_rep120.case3i
│                       ├──
job_201303141454_0005_1363280494548_lealem_gameDataTimeSeriesJob
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0001_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0002_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0003_conf.xml
│                       └──
michaela.ixcloud.net_1363269263476_job_201303141454_0005_conf.xml
├──
job_201303061254_0010_1362574214308_tim_SELECT+*+FROM+temp_panshul_table+WHERE+...10%28Stage
├── job_201303061458_0015_1362582264552_tim_hivetest42
├── job_201303061737_0006_1362588850460_christian_rankingJob
├── job_201303141454_0006_1363281646038_lars+job+2009
├── michaela.ixcloud.net_1362570865029_job_201303061254_0010_conf.xml
├── michaela.ixcloud.net_1362578315853_job_201303061458_0015_conf.xml
├── michaela.ixcloud.net_1362587825736_job_201303061737_0006_conf.xml
└── michaela.ixcloud.net_1363269263476_job_201303141454_0006_conf.xml

I found some information [1] and [2] but those are from 2009-10.
And with the hadoop job -history [3] command i am also not able to read
them (because some _logs/history) folder is missing.

Best Regards,
Christian.

[1]
http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
[2] http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/
[3] http://archive.cloudera.com/cdh4/cdh/4/mr1/cluster_setup.html#Logging


2013/3/13 Christian Schneider <cs...@gmail.com>

> Hi,
> for billing purposes we like to access the job tracker logs by an API (the
> kind of information we see on the jobtracker web interface).
>
> Is there a tool or library for that?
>
> It would be also good if those logs are archived for later usage (or kept
> in a database).
>
> As requirements we need smth like:
> * all jobs by year, month
> * all jobs by year, month, user
> * search by jobname
> ....
>
> Thank you :)
>
> Best Regards,
> Christian.
>
>

Re: Access and archive Job Tracker logs

Posted by Christian Schneider <cs...@gmail.com>.
I digged a little bit deeper and found an interesting "*mapreduce/history"
folder on the JobTracker.

Could you help me how to use these files?


I need to archive them and search them later (by date or job-name).


michaela 18:29:31 /var/log/hadoop-0.20-mapreduce/history # tree
.
├── done
│   ├── michaela.ixcloud.net_1361182032467_
│   │   └── 2013
│   │       └── 01
│   │           └── 18
│   ├── michaela.ixcloud.net_1361195731054_
│   │   └── 2013
│   │       └── 01
│   │           └── 19
│   ├── michaela.ixcloud.net_1363254634813_
│   │   └── 2013
│   │       └── 03
│   │           └── 14
│   │               └── 000000
│   │                   ├──
job_201303141050_0001_1363254715201_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0002_1363254735631_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0003_1363254752878_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0004_1363254766964_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0005_1363254779720_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0006_1363254851856_panshul_PigLatin%3Agametypecounter
│   │                   ├──
job_201303141050_0007_1363255612659_lealem_gameDataTimeSeriesJob
│   │                   ├──
job_201303141050_0010_1363263699635_lealem_gameDataTimeSeriesJob
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0001_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0002_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0003_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0004_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0005_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0006_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0007_conf.xml
│   │                   └──
michaela.ixcloud.net_1363254634813_job_201303141050_0010_conf.xml
│   └── michaela.ixcloud.net_1363269263476_
│       └── 2013
│           └── 03
│               └── 14
│                   └── 000000
│                       ├──
job_201303141454_0001_1363269701945_root_importtsv_timeSeries
│                       ├──
job_201303141454_0002_1363269757097_hdfs_importtsv_timeSeries
│                       ├──
job_201303141454_0003_1363271517403_christian_rep120.case3i
│                       ├──
job_201303141454_0005_1363280494548_lealem_gameDataTimeSeriesJob
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0001_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0002_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0003_conf.xml
│                       └──
michaela.ixcloud.net_1363269263476_job_201303141454_0005_conf.xml
├──
job_201303061254_0010_1362574214308_tim_SELECT+*+FROM+temp_panshul_table+WHERE+...10%28Stage
├── job_201303061458_0015_1362582264552_tim_hivetest42
├── job_201303061737_0006_1362588850460_christian_rankingJob
├── job_201303141454_0006_1363281646038_lars+job+2009
├── michaela.ixcloud.net_1362570865029_job_201303061254_0010_conf.xml
├── michaela.ixcloud.net_1362578315853_job_201303061458_0015_conf.xml
├── michaela.ixcloud.net_1362587825736_job_201303061737_0006_conf.xml
└── michaela.ixcloud.net_1363269263476_job_201303141454_0006_conf.xml

I found some information [1] and [2] but those are from 2009-10.
And with the hadoop job -history [3] command i am also not able to read
them (because some _logs/history) folder is missing.

Best Regards,
Christian.

[1]
http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
[2] http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/
[3] http://archive.cloudera.com/cdh4/cdh/4/mr1/cluster_setup.html#Logging


2013/3/13 Christian Schneider <cs...@gmail.com>

> Hi,
> for billing purposes we like to access the job tracker logs by an API (the
> kind of information we see on the jobtracker web interface).
>
> Is there a tool or library for that?
>
> It would be also good if those logs are archived for later usage (or kept
> in a database).
>
> As requirements we need smth like:
> * all jobs by year, month
> * all jobs by year, month, user
> * search by jobname
> ....
>
> Thank you :)
>
> Best Regards,
> Christian.
>
>

Re: Access and archive Job Tracker logs

Posted by Christian Schneider <cs...@gmail.com>.
I digged a little bit deeper and found an interesting "*mapreduce/history"
folder on the JobTracker.

Could you help me how to use these files?


I need to archive them and search them later (by date or job-name).


michaela 18:29:31 /var/log/hadoop-0.20-mapreduce/history # tree
.
├── done
│   ├── michaela.ixcloud.net_1361182032467_
│   │   └── 2013
│   │       └── 01
│   │           └── 18
│   ├── michaela.ixcloud.net_1361195731054_
│   │   └── 2013
│   │       └── 01
│   │           └── 19
│   ├── michaela.ixcloud.net_1363254634813_
│   │   └── 2013
│   │       └── 03
│   │           └── 14
│   │               └── 000000
│   │                   ├──
job_201303141050_0001_1363254715201_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0002_1363254735631_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0003_1363254752878_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0004_1363254766964_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0005_1363254779720_christian_rep120.case3ii
│   │                   ├──
job_201303141050_0006_1363254851856_panshul_PigLatin%3Agametypecounter
│   │                   ├──
job_201303141050_0007_1363255612659_lealem_gameDataTimeSeriesJob
│   │                   ├──
job_201303141050_0010_1363263699635_lealem_gameDataTimeSeriesJob
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0001_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0002_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0003_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0004_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0005_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0006_conf.xml
│   │                   ├──
michaela.ixcloud.net_1363254634813_job_201303141050_0007_conf.xml
│   │                   └──
michaela.ixcloud.net_1363254634813_job_201303141050_0010_conf.xml
│   └── michaela.ixcloud.net_1363269263476_
│       └── 2013
│           └── 03
│               └── 14
│                   └── 000000
│                       ├──
job_201303141454_0001_1363269701945_root_importtsv_timeSeries
│                       ├──
job_201303141454_0002_1363269757097_hdfs_importtsv_timeSeries
│                       ├──
job_201303141454_0003_1363271517403_christian_rep120.case3i
│                       ├──
job_201303141454_0005_1363280494548_lealem_gameDataTimeSeriesJob
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0001_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0002_conf.xml
│                       ├──
michaela.ixcloud.net_1363269263476_job_201303141454_0003_conf.xml
│                       └──
michaela.ixcloud.net_1363269263476_job_201303141454_0005_conf.xml
├──
job_201303061254_0010_1362574214308_tim_SELECT+*+FROM+temp_panshul_table+WHERE+...10%28Stage
├── job_201303061458_0015_1362582264552_tim_hivetest42
├── job_201303061737_0006_1362588850460_christian_rankingJob
├── job_201303141454_0006_1363281646038_lars+job+2009
├── michaela.ixcloud.net_1362570865029_job_201303061254_0010_conf.xml
├── michaela.ixcloud.net_1362578315853_job_201303061458_0015_conf.xml
├── michaela.ixcloud.net_1362587825736_job_201303061737_0006_conf.xml
└── michaela.ixcloud.net_1363269263476_job_201303141454_0006_conf.xml

I found some information [1] and [2] but those are from 2009-10.
And with the hadoop job -history [3] command i am also not able to read
them (because some _logs/history) folder is missing.

Best Regards,
Christian.

[1]
http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
[2] http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/
[3] http://archive.cloudera.com/cdh4/cdh/4/mr1/cluster_setup.html#Logging


2013/3/13 Christian Schneider <cs...@gmail.com>

> Hi,
> for billing purposes we like to access the job tracker logs by an API (the
> kind of information we see on the jobtracker web interface).
>
> Is there a tool or library for that?
>
> It would be also good if those logs are archived for later usage (or kept
> in a database).
>
> As requirements we need smth like:
> * all jobs by year, month
> * all jobs by year, month, user
> * search by jobname
> ....
>
> Thank you :)
>
> Best Regards,
> Christian.
>
>