You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Edward Yoon (JIRA)" <ji...@apache.org> on 2007/12/21 09:56:43 UTC

[jira] Created: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

[Hbase Shell] Log Analysis Examples
-----------------------------------

                 Key: HADOOP-2480
                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/hbase
    Affects Versions: 0.16.0
         Environment: All
            Reporter: Edward Yoon
            Assignee: Edward Yoon
            Priority: Trivial
             Fix For: 0.16.0


I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
I think these examples are general usage examples.
So, i'd like to contribute to hbase.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560761#action_12560761 ] 

otis edited comment on HADOOP-2480 at 1/19/08 6:40 PM:
-------------------------------------------------------------------

Ed - I was really looking forward to this work getting into contrib (it seems like a very practical hbase example).  Do you plan on continuing with this work later, or are you stopping the work?  Ah, waiting for 0.16?  I am especially curious to see what you have in mind for social network analysis.

      was (Author: otis):
    Ed - I was really looking forward to this work getting into contrib (it seems like a very practical hbase example).  Do you plan on continuing with this work later, or are you stopping the work?  I was especially curious to see what you have in mind for social network analysis.
  
> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554638 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

Thanks for your review, Fredrik.
I believe the bulk upload will be resolved at HADOOP-2075.

Now i'm large size test. So I know It took a long time for upload to hbase.


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


HADOOP-1824 | Proposed implementation

Posted by "Goel, Ankur" <An...@corp.aol.com>.
 
Hi,
   I am working on developing an InputFormat for zip files
as required by HADOOP-1824. For the same I would like to propose
a simple approach and invite comments and suggestions from the 
community for my implementation.

Implementation Approach
-----------------------

1. Implement class ZipInputFormat to extend FileInputFormat.

2. Override the getSplits() method to read each file's
   InputStream and construct a ZipInputStream out of it.

3. Create FileSplits in a way that each file split has the following
   properties
	*  FileSplit.start = start index of a zip entry.
      *  FileSplit.length = end index of a zip entry.
      *  fileSplit.file = Zip file.
      *  Sum of compressed size of zip entries <= splitSize

   For e.g. start = 3, length = 6 signifies that zip entries 3 to 6 
   will be read from the zip file of this split.

4. Implement class ZipRecordReader to read each zip entry in its split
   Using LineRecordReader.

I think I might be required to deal with compressionCodecFatory and
other
classes related to compression. How exactly, is not very clear to me.
So any hints here would be useful.

Apart from the above please let me know if there is anything that I am 
missing.

Thanks
-Ankur

[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554389 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

Slaves seems return the "MasterNotRunningException"  error in my cluster.
What is wrong?

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554991 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

Hmm.. 
It seem okay but In the case of the all time we can code easy using API. :(
PIG is difficult to learn, too.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* url family is a historical page-move vector of client.
* row by url is a user by document matrix. 
** cell can be a numeric value of document visit frequency or a incoming value from specified web.
* ... etc.

{code}
ip <row>    http                            url               
-------------------------------------------------------------------
ip          http:agent     <agent>          url:URL   <referrer>
            http:protocol  <protocol>       ...
            http:method    <method>         
            http:code      <response code>
            http:bytesize  <bytesize>           
{code}

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* specified 'client:ip' is a historical page-move vector.
* row by client is a user by document matrix. 
** cell can be a numeric value of document visit frequency.
* ... etc.

{code}
url <row>    http                            client                  referrer                                       
------------------------------------------------------------------------------------------------------------------------
/FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
             http:method    <method>         ....
             http:code      <response code>
             http:bytesize  <bytesize>
{code}


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555252#action_12555252 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

>> why not use the TableOutputFormat and run lots of reducers so lots of clients going against hbase? -- stack.

Becuase, It is impossible to assign an specified timestamp to each row.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* specified 'client:ip' is a historical page-move vector.
* row by client is a user by document matrix. 
** cell can be a numeric value of document visit frequency.
* ... etc.

{code}
url <row>    http                            client                  referrer                                       
------------------------------------------------------------------------------------------------------------------------
/FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
             http:method    <method>         ....
             http:code      <response code>
             http:bytesize  <bytesize>
{code}

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
url <row>    http                            client                  referrer                                       
------------------------------------------------------------------------------------------------------------------------
/FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
             http:method    <method>         ....
             http:code      <response code>
             http:bytesize  <bytesize>
{code}


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * specified 'client:ip' is a historical page-move vector.
> * row by client is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency.
> * ... etc.
> {code}
> url <row>    http                            client                  referrer                                       
> ------------------------------------------------------------------------------------------------------------------------
> /FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
>              http:method    <method>         ....
>              http:code      <response code>
>              http:bytesize  <bytesize>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work stopped: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-2480 stopped by Edward Yoon.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- *(5 Terra Bytes Logs wil be used)*

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554655 ] 

stack commented on HADOOP-2480:
-------------------------------

Yeah, why not use the TableOutputFormat and run lots of reducers so lots of clients going against hbase?

Creating a table instance per map task is expensive.  Do it once in the init.

You have hard-coded paths in your code to udanax/temp.  You might want to remove that.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Attachment: v01.patch

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Fredrik Hedberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554558 ] 

Fredrik Hedberg commented on HADOOP-2480:
-----------------------------------------

May I ask why you're manually inserting data into HBase and not just emitting the rows and using the HBase IdentityReducer?

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554385 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

{code}
hql > jar ./build/contrib/hbase/hadoop-0.16.0-dev-hbase-examples.jar logfetcher udanax/logs access_log;
1. access_log table already exists.
2. access_log files fetching using map/reduce
07/12/26 17:39:32 INFO mapred.FileInputFormat: Total input paths to process : 1
07/12/26 17:39:33 INFO mapred.JobClient: Running job: job_200712261737_0001
07/12/26 17:39:34 INFO mapred.JobClient:  map 0% reduce 0%
07/12/26 17:40:06 INFO mapred.JobClient:  map 1% reduce 0%
07/12/26 17:40:26 INFO mapred.JobClient:  map 2% reduce 0%
07/12/26 17:40:46 INFO mapred.JobClient:  map 3% reduce 0%
07/12/26 17:41:06 INFO mapred.JobClient:  map 4% reduce 0%
07/12/26 17:41:12 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000002_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:41:13 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000003_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:41:27 INFO mapred.JobClient:  map 5% reduce 0%
07/12/26 17:41:46 INFO mapred.JobClient:  map 6% reduce 0%
07/12/26 17:42:11 INFO mapred.JobClient:  map 7% reduce 0%
07/12/26 17:42:26 INFO mapred.JobClient:  map 8% reduce 0%
07/12/26 17:42:46 INFO mapred.JobClient:  map 9% reduce 0%
07/12/26 17:42:48 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000004_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:42:48 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000005_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:43:11 INFO mapred.JobClient:  map 10% reduce 0%
07/12/26 17:43:31 INFO mapred.JobClient:  map 11% reduce 0%
07/12/26 17:43:52 INFO mapred.JobClient:  map 11% reduce 1%
07/12/26 17:43:57 INFO mapred.JobClient:  map 12% reduce 1%
07/12/26 17:44:02 INFO mapred.JobClient:  map 12% reduce 3%
07/12/26 17:44:22 INFO mapred.JobClient:  map 13% reduce 3%
07/12/26 17:44:25 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000006_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:44:25 INFO mapred.JobClient: Task Id : task_200712261737_0001_m_000007_0, Status : FAILED
org.apache.hadoop.hbase.MasterNotRunningException
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getMaster(HConnectionManager.java:209)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:575)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:462)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:479)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:534)
        at org.apache.hadoop.hbase.HConnectionManager$TableServers.getTableServers(HConnectionManager.java:312)
        at org.apache.hadoop.hbase.HTable.<init>(HTable.java:92)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:81)
        at org.apache.hadoop.hbase.LogFetcher$MapClass.map(LogFetcher.java:47)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)

07/12/26 17:44:37 INFO mapred.JobClient:  map 14% reduce 3%
07/12/26 17:44:56 INFO mapred.JobClient:  map 15% reduce 3%
{code}



> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
url <row>    http                            client                       referrer                                       
----------------------------------------------------------------------------------------------------------------------------
/index.php   http:protocol  <protocol>       client:ip   <referrer>       referrer:http://www.zeroboard.cn  <ip>
             http:method    <method>        
             http:code      <response code>
{code}

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
Table schema

url <row>    http                           client                    referrer   
--------------------------------------------------------------------------------------------------------------------
/index.php   http:protocol  <protocol>      client:ip      <ip>       referrer:http://www.zeroboard.cn  <byteSize>
             http:method    <method>        client:agent   <agent>
             http:code      <response code>
{code}


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> {code}
> url <row>    http                            client                       referrer                                       
> ----------------------------------------------------------------------------------------------------------------------------
> /index.php   http:protocol  <protocol>       client:ip   <referrer>       referrer:http://www.zeroboard.cn  <ip>
>              http:method    <method>        
>              http:code      <response code>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560765#action_12560765 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

> it seems like a very practical hbase example

Thank you very much. :=) I'll work on jimk and stack.

> Do you plan on continuing with this work later

Yes! Sir!

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- *(5 Terra Bytes Logs wil be used)*

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

  was:
I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- *(5 Terra Bytes Logs wil be used)*

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - *(5 Terra Bytes Logs wil be used)*
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554990 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

I also think that maybe use a HQL instead of hard code is better for first ~ third level.
Then, PIG can be used for high-level map/reduce programing.


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-2480 started by Edward Yoon.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554971 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

OK, i see.

And, i'd like to separate the hbase shell build.xml from hbase build.xml.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554386 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

See http://shell.hadoop.co.kr/PHPClient.php

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Attachment: v02.patch

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554992 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

Anyway, I think it need some example corner how to use the hbase and hql.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* url family is a historical page-move vector of client.
* row by url is a user by document matrix. 
** cell can be a numeric value of document visit frequency or a incoming value from specified web.
* ... etc.

{code}
ip <row>    http                            url               
-------------------------------------------------------------------
ip          http:agent     <agent>          url:URL   <referrer>
            http:protocol  <protocol>       ...
            http:method    <method>         
            http:code      <response code>
            http:bytesize  <bytesize>           
{code}

*Log models and Applications*

* Next Page Recommendation
* Page Network Analysis

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* url family is a historical page-move vector of client.
* row by url is a user by document matrix. 
** cell can be a numeric value of document visit frequency or a incoming value from specified web.
* ... etc.

{code}
ip <row>    http                            url               
-------------------------------------------------------------------
ip          http:agent     <agent>          url:URL   <referrer>
            http:protocol  <protocol>       ...
            http:method    <method>         
            http:code      <response code>
            http:bytesize  <bytesize>           
{code}


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Fredrik Hedberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554559 ] 

Fredrik Hedberg commented on HADOOP-2480:
-----------------------------------------

Pardon my ignorance. It's clearly inefficient to use the HBase IdentityReducer I suppose -  unless somebody comes up with a smart batch insertion method and the emitted rows are shuffled to the right regionserver node (?).

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560761#action_12560761 ] 

Otis Gospodnetic commented on HADOOP-2480:
------------------------------------------

Ed - I was really looking forward to this work getting into contrib (it seems like a very practical hbase example).  Do you plan on continuing with this work later, or are you stopping the work?  I was especially curious to see what you have in mind for social network analysis.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
Table schema

url <row>    http                           client                    referrer   
--------------------------------------------------------------------------------------------------------------------
/index.php   http:protocol  <protocol>      client:ip      <ip>       referrer:http://www.zeroboard.cn  <byteSize>
             http:method    <method>        client:agent   <agent>
             http:code      <response code>
{code}

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|


define the Table shema for modeling.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> {code}
> Table schema
> url <row>    http                           client                    referrer   
> --------------------------------------------------------------------------------------------------------------------
> /index.php   http:protocol  <protocol>      client:ip      <ip>       referrer:http://www.zeroboard.cn  <byteSize>
>              http:method    <method>        client:agent   <agent>
>              http:code      <response code>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

        Fix Version/s:     (was: 0.16.0)
    Affects Version/s:     (was: 0.16.0)

Fixing the fix version because I think it is difficult on current hbase.


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
* 5 Terra Bytes Logs will be used. 
* You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* url family is a historical page-move vector of client.
* row by url is a user by document matrix. 
** cell can be a numeric value of document visit frequency or a incoming value from specified web.
* ... etc.

{code}
ip <row>    http                            url               
-------------------------------------------------------------------
ip          http:agent     <agent>          url:URL   <referrer>
            http:protocol  <protocol>       ...
            http:method    <method>         
            http:code      <response code>
            http:bytesize  <bytesize>           
{code}

*Log models and Applications*

* Next Page Recommendation
* Page Network Analysis

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*

* url family is a historical page-move vector of client.
* row by url is a user by document matrix. 
** cell can be a numeric value of document visit frequency or a incoming value from specified web.
* ... etc.

{code}
ip <row>    http                            url               
-------------------------------------------------------------------
ip          http:agent     <agent>          url:URL   <referrer>
            http:protocol  <protocol>       ...
            http:method    <method>         
            http:code      <response code>
            http:bytesize  <bytesize>           
{code}

*Log models and Applications*

* Next Page Recommendation
* Page Network Analysis


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> * 5 Terra Bytes Logs will be used. 
> * You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554502 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

Oh. it was just a hbase-setup problem.

{code}
hql > jar ./build/contrib/hbase/hadoop-0.16.0-dev-hbase-examples.jar logfetcher udanax/logs access_log;
1. access_log table already exists.
2. access_log files fetching using map/reduce
07/12/27 09:40:45 INFO mapred.FileInputFormat: Total input paths to process : 1
07/12/27 09:40:45 INFO mapred.JobClient: Running job: job_200712270938_0001
07/12/27 09:40:46 INFO mapred.JobClient:  map 0% reduce 0%
07/12/27 09:41:17 INFO mapred.JobClient:  map 1% reduce 0%
07/12/27 09:41:32 INFO mapred.JobClient:  map 2% reduce 0%
07/12/27 09:41:56 INFO mapred.JobClient:  map 3% reduce 0%
07/12/27 09:42:12 INFO mapred.JobClient:  map 4% reduce 0%
07/12/27 09:42:31 INFO mapred.JobClient:  map 5% reduce 0%
07/12/27 09:42:51 INFO mapred.JobClient:  map 6% reduce 0%
07/12/27 09:43:11 INFO mapred.JobClient:  map 7% reduce 0%
07/12/27 09:43:27 INFO mapred.JobClient:  map 8% reduce 0%
07/12/27 09:43:47 INFO mapred.JobClient:  map 9% reduce 0%
07/12/27 09:44:07 INFO mapred.JobClient:  map 10% reduce 0%
07/12/27 09:44:27 INFO mapred.JobClient:  map 11% reduce 0%
07/12/27 09:44:46 INFO mapred.JobClient:  map 12% reduce 0%
07/12/27 09:45:02 INFO mapred.JobClient:  map 13% reduce 0%
07/12/27 09:45:21 INFO mapred.JobClient:  map 14% reduce 0%
07/12/27 09:45:37 INFO mapred.JobClient:  map 15% reduce 0%
07/12/27 09:45:57 INFO mapred.JobClient:  map 16% reduce 0%
07/12/27 09:46:17 INFO mapred.JobClient:  map 17% reduce 0%
07/12/27 09:46:37 INFO mapred.JobClient:  map 18% reduce 0%
07/12/27 09:46:52 INFO mapred.JobClient:  map 19% reduce 0%
07/12/27 09:47:02 INFO mapred.JobClient:  map 19% reduce 1%
07/12/27 09:47:12 INFO mapred.JobClient:  map 20% reduce 1%
07/12/27 09:47:17 INFO mapred.JobClient:  map 20% reduce 3%
07/12/27 09:47:32 INFO mapred.JobClient:  map 21% reduce 3%
07/12/27 09:47:52 INFO mapred.JobClient:  map 22% reduce 3%
07/12/27 09:48:07 INFO mapred.JobClient:  map 23% reduce 3%
07/12/27 09:48:26 INFO mapred.JobClient:  map 24% reduce 3%
07/12/27 09:48:43 INFO mapred.JobClient:  map 25% reduce 3%
07/12/27 09:49:02 INFO mapred.JobClient:  map 26% reduce 3%
07/12/27 09:49:19 INFO mapred.JobClient:  map 27% reduce 3%
07/12/27 09:49:38 INFO mapred.JobClient:  map 28% reduce 3%
07/12/27 09:49:53 INFO mapred.JobClient:  map 29% reduce 3%
07/12/27 09:50:08 INFO mapred.JobClient:  map 29% reduce 5%
07/12/27 09:50:12 INFO mapred.JobClient:  map 30% reduce 7%
07/12/27 09:50:28 INFO mapred.JobClient:  map 31% reduce 7%
07/12/27 09:50:47 INFO mapred.JobClient:  map 32% reduce 7%
07/12/27 09:51:07 INFO mapred.JobClient:  map 33% reduce 7%
07/12/27 09:51:22 INFO mapred.JobClient:  map 34% reduce 7%
07/12/27 09:51:39 INFO mapred.JobClient:  map 35% reduce 7%
07/12/27 09:51:59 INFO mapred.JobClient:  map 36% reduce 7%
07/12/27 09:52:17 INFO mapred.JobClient:  map 37% reduce 7%
07/12/27 09:52:34 INFO mapred.JobClient:  map 38% reduce 7%
07/12/27 09:52:43 INFO mapred.JobClient:  map 38% reduce 9%
07/12/27 09:52:53 INFO mapred.JobClient:  map 39% reduce 9%
07/12/27 09:53:09 INFO mapred.JobClient:  map 39% reduce 11%
07/12/27 09:53:12 INFO mapred.JobClient:  map 40% reduce 11%
07/12/27 09:53:32 INFO mapred.JobClient:  map 41% reduce 11%
07/12/27 09:53:47 INFO mapred.JobClient:  map 42% reduce 11%
07/12/27 09:54:07 INFO mapred.JobClient:  map 43% reduce 11%
07/12/27 09:54:22 INFO mapred.JobClient:  map 44% reduce 11%
07/12/27 09:54:36 INFO mapred.JobClient:  map 42% reduce 11%
{code}

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554387 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

After codes are done, i'll report the 5 TB log test results.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
url <row>    http                            client                  referrer                                       
------------------------------------------------------------------------------------------------------------------------
/FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
             http:method    <method>         ....
             http:code      <response code>
             http:bytesize  <bytesize>
{code}

  was:
I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

*Table schema*
{code}
url <row>    http                            client                       referrer                                       
----------------------------------------------------------------------------------------------------------------------------
/index.php   http:protocol  <protocol>       client:ip   <referrer>       referrer:http://www.zeroboard.cn  <ip>
             http:method    <method>        
             http:code      <response code>
{code}


> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> {code}
> url <row>    http                            client                  referrer                                       
> ------------------------------------------------------------------------------------------------------------------------
> /FrontPage   http:protocol  <protocol>       client:ip   <URL>       referrer:http://www.google.co.kr/q?=hadoop  <IP>
>              http:method    <method>         ....
>              http:code      <response code>
>              http:bytesize  <bytesize>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554993 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

{code}
hql > jar ./build/contrib/hbase/hadoop-0.16.0-dev-hbase-examples.jar logfetcher udanax/logs access_log;
1. access_log table createing.

   hql > CREATE TABLE access_log ('http', 'url' MAX_LENGTH:10000); _
   Please wait ... creating
   1 table was created (0.1 sec)

   hql > SHOW TABLES;
   ...

2. access_log files fetching using map/reduce

07/12/27 09:40:45 INFO mapred.FileInputFormat: Total input paths to process : 1
07/12/27 09:40:45 INFO mapred.JobClient: Running job: job_200712270938_0001
07/12/27 09:40:46 INFO mapred.JobClient:  map 0% reduce 0%
07/12/27 09:41:17 INFO mapred.JobClient:  map 1% reduce 0%

   hql > SELECT url FROM access_log LIMIT=10;
   ...

3. ...
{code}

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554505 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

- It should be 'Map Only'.

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554988 ] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

I think ..
(table creating -> data gathering (loop) -> arbitrary new relation creating by relational algebraic operation -> high-level map/reduce programming)
programming sequence is  the best hbase usage.



> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
I think these examples are general usage examples.
So, i'd like to contribute to hbase.


*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

  was:
I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
I think these examples are general usage examples.
So, i'd like to contribute to hbase.



> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
> I think these examples are general usage examples.
> So, i'd like to contribute to hbase.
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2480) [Hbase Shell] Log Analysis Examples

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2480:
--------------------------------

    Description: 
I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
- *(5 Terra Bytes Logs wil be used)*

*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

  was:
I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale.
I think these examples are general usage examples.
So, i'd like to contribute to hbase.


*Access_log Entry*
||Example Data Element||Description||
|208.177.157.164|IP address of the client requesting the web page|
|-|Identity of the client; typically blank for modern browsers, which hide this information|
|-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
|[15/Aug/2004:10:59:38 -0800] |Time the request was made|
|"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
|200|Status code for the request. 200 means it was successfully handled|
|-|Number of bytes transferred to the client in response to this request|
|"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
|"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|


Update description

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch
>
>
> I was made an apache log fetcher, log analyzer, social network analyzer using map/reduce on hbase table for large scale .
> - *(5 Terra Bytes Logs wil be used)*
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client making the request|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.