You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by "Vinayak Borkar (JIRA)" <ji...@apache.org> on 2012/06/20 03:34:43 UTC

[jira] [Comment Edited] (VXQUERY-34) Basic String Functions

    [ https://issues.apache.org/jira/browse/VXQUERY-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397207#comment-13397207 ] 

Vinayak Borkar edited comment on VXQUERY-34 at 6/20/12 1:34 AM:
----------------------------------------------------------------

Preston,

The string-length function looks good. I have a few comments about the upper-case and lower-case functions.

1. The functions currently create an object on *every* invocation. The byte[] is created every time the function is called. I would suggest using edu.uci.ics.hyracks.dataflow.common.data.accessors.ArrayBackedValueStorage as the storage for the data. This class embeds a growable byte array that only allocates a new object if the existing object is not large enough and tracks the used size separately. You can call the reset() method to reset the "used bytes" to 0 without destroying the internal byte array.
2. It looks like you first walk over the input string, convert each character to its upper/lower case value just to measure the length of the new string. Another strategy is to skip two bytes in the result byte array (actually 3, because the first byte will be the tag in your case), and start appending the characters after transcoding. At the end, go back and patch the two bytes representing the UTF8 length in the result with the actual length. This way you do not have to process each string twice.
3. Finally, to address your code reuse question, you could have upper and lower case functions both extend an AbstractStringTranscodingFunction which has one protected abstract method:

protected abstract char transcodeCharacter(char c);

You could then move all the code that you have in the computation to the base class, while calling the transcodeCharacter method to get the "converted" character. In the concrete classes you will need to implement the transcodeCharacter method to return the upper/lower case character appropriately.

Thanks,
Vinayak
                
      was (Author: vinayakb):
    Preston,

The string-length function looks good. I have a few comments about the upper-case and lower-case functions.

1. The functions currently create an object on *every* invocation. The byte[] is created every time the function is called. I would suggest using edu.uci.ics.hyracks.dataflow.common.data.accessors.ArrayBackedValueStorage as the storage for the data. This class embeds a growable byte array that only allocates a new object if the existing object is not large enough and tracks the used size separately. You can call the reset() method to reset the "used bytes" to 0 without destroying the internal byte array.
2. It looks like you first walk over the input string, convert each character to its upper/lower case value just to measure the length of the new string. Another strategy is to skip two bytes in the result byte array (actually 3, because the first byte will be the tag in your case), and start appending the characters after transcoding. At the end, go back and patch the two bytes representing the UTF8 length in the result with the actual length. This way you do not have to process each string twice.
3. Finally, to address your code reuse question, you could have upper and lower case functions both extend an Abstract string transcoding function which has one protected abstract method:

protected abstract char transcodeCharacter(char c);

You could then move all the code that you have in the computation to the base class, while calling the transcodeCharacter method to get the "converted" character. In the concrete classes you will need to implement the transcodeCharacter method to return the upper/lower case character appropriately.

Thanks,
Vinayak
                  
> Basic String Functions 
> -----------------------
>
>                 Key: VXQUERY-34
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-34
>             Project: VXQuery
>          Issue Type: Task
>            Reporter: Preston Carman
>              Labels: patch
>         Attachments: BasicStringFunctions2.patch
>
>
> The basic string functions to build help with basic queries.
> fn:concat - Concatenates two or more xs:anyAtomicType arguments cast to xs:string.
> fn:string-join - Returns the xs:string produced by concatenating a sequence of xs:strings using an optional separator.
> fn:substring - Returns the xs:string located at a specified place within an argument xs:string.
> fn:string-length - Returns the length of the argument.
> fn:upper-case - Returns the upper-cased value of the argument.
> fn:lower-case - Returns the lower-cased value of the argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira