You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2012/07/21 19:36:33 UTC

[jira] [Created] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Josh Elser created ACCUMULO-697:
-----------------------------------

             Summary: Break Scanner parameterization from Key,Value to Key,{Something}
                 Key: ACCUMULO-697
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
             Project: Accumulo
          Issue Type: Improvement
          Components: client
    Affects Versions: 1.5.0
            Reporter: Josh Elser
            Assignee: Josh Elser


When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.

I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "David Medinets (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421803#comment-13421803 ] 

David Medinets commented on ACCUMULO-697:
-----------------------------------------

Thank you. I've used generics before but hadn't explicitly thought about K and V simply being placeholders.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Josh Elser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421529#comment-13421529 ] 

Josh Elser commented on ACCUMULO-697:
-------------------------------------

bq. What is K and V? Where are they defined? I know they mean key and value but I'd like to grok the fundamentals. Sorry for this basic question, and pardon me for asking it here, but I don't know what kind of web search would give me an answer.

http://en.wikipedia.org/wiki/Generics_in_Java
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "David Medinets (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421523#comment-13421523 ] 

David Medinets commented on ACCUMULO-697:
-----------------------------------------

What is K and V? Where are they defined? I know they mean key and value but I'd like to grok the fundamentals. Sorry for this basic question, and pardon me for asking it here, but I don't know what kind of web search would give me an answer.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "William Slacum (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419898#comment-13419898 ] 

William Slacum commented on ACCUMULO-697:
-----------------------------------------

Does this work with client side iterators only? The top level iterator will usually be running server side, so you'll need to have an interface to set a serializer for some object at the top of the stack (and possibly another deserializer if you have custom iterators beneath it) and set a deserializer at the client level. At some point on the server side, the iterator will have to transform the user type into a {{byte[]}}, which is basically what a {{Value}} is. On the client side, the user will still have to supply code to deserialize the {{byte[]}}, which can already be accomplished with no code changes by using a library such as Google Collections/Guava or Apache Commons.

                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Billie Rinaldi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421512#comment-13421512 ] 

Billie Rinaldi commented on ACCUMULO-697:
-----------------------------------------

A related feature in our API is InputFormatBase<K,V> which you can extend to provide any type of K,V.  This was to support the ChunkInputFormat which wraps a set of Values in an InputStream.

I could see making ScannerBase extend Iterable<E> instead of Iterable<Entry<Key,Value>>.  Then individual scanner implementations could provide whatever types they wanted, e.g. Iterable<KeyValue>, Iterable<Entry<Key,Value>>, Iterable<Entry<Key,Whatever>>.  We could model the use of these after the IsolatedScanner and ClientSideIteratorScanner, e.g. Scanner scanner = new ClientSideIteratorScanner(connector.createScanner(tableName, authorizations)).  The default types could stay Entry<Key,Value>, and you would use a client side scanner wrapper to translate them.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Josh Elser (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419984#comment-13419984 ] 

Josh Elser commented on ACCUMULO-697:
-------------------------------------

John, I should have mentioned that the "inspiration" for something like this came from the TypedValueCombiner. If someone is already instrumenting code to run in Accumulo (writing an SKVI), I see no reason why to not let them actually deal in the types which they want, rather than forcing them to deal with everything in terms of Key/Value.

Adam and Bill definitely hit on a larger issue of how far you can go with such an idea, but the scope of what I'll attempt here is much smaller than that.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Josh Elser (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on ACCUMULO-697 started by Josh Elser.

> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "John Vines (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419978#comment-13419978 ] 

John Vines commented on ACCUMULO-697:
-------------------------------------

I don't know why this is something we want to integrate into our API. A utility can easily be written to do whatever transformations you want on top of the scanner's iterator, so why push this down into our api? I think a change like this could only serve to make the api harder to use for little to no real gain.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Adam Fuchs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419922#comment-13419922 ] 

Adam Fuchs commented on ACCUMULO-697:
-------------------------------------

I like the concept, but does this go far enough? If Values aren't special, then are Keys special, and if so then why? Should we make our SortedKeyValueIterator implement Iterable<? extends Object> ? Then the bottom level iterator (RFile reader) would include KeyValue or Entry<Key,Value> objects, the top level iterator for scans would have to have objects that are serializable, and the top level iterator for compactions would have to implement Iterable<Entry<Key,Value>>.

One of the problems we have with iterators now is that the Key and Value are accessed with separate methods, even though they're always read off of disk together. Splitting up the Key and Value on the server side is sort of arbitrary and could reduce our ability to parallelize iterators (if we ever decide that's something we want to do).

Another problem is that SortedKeyValueIterator falls somewhere in between Java's Iterator and Iterable interfaces. SortedKeyValueIterator holds onto filters, aggregation parameters, etc. that make it act like a collection, and it keeps a pointer to somewhere in that collection like an Iterator. I think we should change SortedKeyValueIterator into more like an immutable collection, or a consistent, isolated, unchanging view of the data, and have it implement Iterable. That might open up opportunities for automating optimization of queries on the server side, or better support for built-in iterator tree definition languages.
                
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}

Posted by "Christopher Tubbs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Tubbs updated ACCUMULO-697:
---------------------------------------

    Affects Version/s:     (was: 1.5.0)
    
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-697
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-697
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> When writing a custom iterator, many times the iterator has some semantic knowledge of what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value but really is returning an Integer/Long count in the Value). This forces the client to know what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack for a scan can return something other than a Value to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira