You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Earl Cahill (JIRA)" <ji...@apache.org> on 2008/10/10 07:31:45 UTC

[jira] Created: (PIG-486) want to be able to extract searchEngine from a url

want to be able to extract searchEngine from a url
--------------------------------------------------

                 Key: PIG-486
                 URL: https://issues.apache.org/jira/browse/PIG-486
             Project: Pig
          Issue Type: New Feature
            Reporter: Earl Cahill


Given a url, want to retrieve a name for the search engine.

With pig latin usage like

searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);

The url

http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search

would return

Google



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-486:
----------------------------

    Attachment:     (was: SearchEngineExtractor-PIG-486)

> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-486:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Earl for you contribution.

> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: SearchEngineExtractor-PIG-486
>
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638598#action_12638598 ] 

Alan Gates commented on PIG-486:
--------------------------------

I think the source of the search engines is fine, except that it is static and hence will get out of date.  I don't know a way around this.  But the list of known engines should be called out in the java docs so users can easily see whether the engines they are interested in are included.

> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: SearchEngineExtractor-PIG-486
>
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-486:
----------------------------

    Attachment: SearchEngineExtractor-PIG-486

I got the names for the search engines from 

http://search.cpan.org/~sden/URI-ParseSearchString-2.6/

and they are certainly up for debate

> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: SearchEngineExtractor-PIG-486
>
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-486:
----------------------------

    Status: Patch Available  (was: Open)

patch contains

org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor


> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-486) want to be able to extract searchEngine from a url

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-486:
----------------------------

    Attachment: SearchEngineExtractor-PIG-486

adding supported search engines to the javadoc


> want to be able to extract searchEngine from a url
> --------------------------------------------------
>
>                 Key: PIG-486
>                 URL: https://issues.apache.org/jira/browse/PIG-486
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: SearchEngineExtractor-PIG-486
>
>
> Given a url, want to retrieve a name for the search engine.
> With pig latin usage like
> searchEngine = FOREACH row GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
> The url
> http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
> would return
> Google

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.