You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2009/02/19 20:00:03 UTC
[jira] Created: (PIG-683) Semantics of TOKENIZE are not clear
Semantics of TOKENIZE are not clear
-----------------------------------
Key: PIG-683
URL: https://issues.apache.org/jira/browse/PIG-683
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Fix For: types_branch
The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens.
On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-683) Semantics of TOKENIZE are not clear
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-683.
--------------------------------
Resolution: Invalid
We can't change semantics of an existing functions. If users need a different interface, they can create another function that suits their needs
> Semantics of TOKENIZE are not clear
> -----------------------------------
>
> Key: PIG-683
> URL: https://issues.apache.org/jira/browse/PIG-683
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Santhosh Srinivasan
> Fix For: types_branch
>
>
> The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens.
> On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.