You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ted Malaska (JIRA)" <ji...@apache.org> on 2012/08/22 02:40:37 UTC

[jira] [Created] (PIG-2886) Add Scan TimeRange to HBaseStorage

Ted Malaska created PIG-2886:
--------------------------------

             Summary: Add Scan TimeRange to HBaseStorage 
                 Key: PIG-2886
                 URL: https://issues.apache.org/jira/browse/PIG-2886
             Project: Pig
          Issue Type: Bug
            Reporter: Ted Malaska
            Priority: Minor


I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

    Attachment: PIG-2886-0.patch

Adds timeRange to HBaseStorage
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447046#comment-13447046 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Wow thx.  Sorry about the --no-prefix I will make sure to do that in the future.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.11
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch, PIG-2886-3.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

    Attachment: PIG-2886-1.patch

Making progress.  Need to check some more things before I'm totally done.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439197#comment-13439197 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

OK, Sounds good. I can't do it tonight.  I'll read the directions and do it tomorrow.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

    Status: Open  (was: Patch Available)

Making more changes
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.9.1, 0.9.0
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447036#comment-13447036 ] 

Dmitriy V. Ryaboy commented on PIG-2886:
----------------------------------------

Urk.. git patch. You need to generate it with 'git diff --no-prefix' otherwise we can't apply it. I mean, we can, and I did, but for next time, --no-prefix makes life easier :).
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.11
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch, PIG-2886-3.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439273#comment-13439273 ] 

Bill Graham commented on PIG-2886:
----------------------------------

Thanks for the patch Ted! The code looks good, just a few nits about style mainly.

- That patch has a bunch of diff info about your git internal files, so it doesn't apply.
- Standard indents in Pig are 4 spaces (no tabs).
- Use a single space after brackets and between close parens and open brackets in your else/if statements.
- else should be on one line, i.e., } else {
- A pair of empty newlines were added after the {{ignoreWhitespace_}} block, which should be removed.
- Typos: TimeRagne and "Timestamp most be"

Also would you please add a unit test to TestHBaseStorage.
 


                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446842#comment-13446842 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Give me a couple minutes I need to check in a new comment and help message
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439195#comment-13439195 ] 

Cheolsoo Park commented on PIG-2886:
------------------------------------

Hi Ted, I think that you need to post your patch to this jira and wait for a committer to review/commit it. Please refer to:

https://cwiki.apache.org/confluence/display/PIG/HowToContribute
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443636#comment-13443636 ] 

Cheolsoo Park commented on PIG-2886:
------------------------------------

Hi Ted,

Regarding TestHBaseStorage, does it hang in hadoop 20 or 23? I assume that you're not setting "-Dhadoopversion" so using hadoop 20 by default. In hadoop 20, TestHBaseStorage passes for me with your patch. I.e. "ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20" passes.
{code}
[junit] Running org.apache.pig.test.TestHBaseStorage
[junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 131.728 sec
{code}
If it doesn't pass for you, it should be some environment issue. (e.g. did you set umask 0022?)

However, it does time out in hadoop 23, and I believe that it's expected since hbase jar from the maven repository is not binary compatible with hadoop 23. I.e. "ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=23" fails with time out error, and the following error can be found in the test log (build/test/logs/TEST-org.apache.pig.test.TestHBaseStorage.txt):
{code}
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 7 more
{code}

I ran into the same issue while bumping hbase to 0.94, but it seem applied to 0.90 (current version in trunk) as well. Please see HBASE-5680 for more details.

Please anyone corrects me if I am wrong about TestHBaseStorage in hadoop 23.

Thanks!
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446829#comment-13446829 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Found two type-os.  I will have the fix and the maxVersion functionality soon
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446840#comment-13446840 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

OK sounds good.  I will just update the type-o then
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444694#comment-13444694 ] 

Dmitriy V. Ryaboy commented on PIG-2886:
----------------------------------------

Hi Ted,
Great to see clouderians contributing to Pig again! :)

Couple of notes:

minTimeRange, maxTimeRange -- maybe better names would be minTimestamp and maxTimestamp ?
That's the signature for HBase's scanTimeRange.

Also, please fix up documentation -- minTimestamp in scan.setTimeRange is *inclusive* (so, not strictly greater then). maxTimestamp is, indeed, exclusive -- the range is [min, max)

space between } and "else" around maxTimeRange  handling.

HBase scan also provides setTimestamp(). Might as well throw that in?

Does your client care about # of returned versions? That's a much tricker change.. 



                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446843#comment-13446843 ] 

Dmitriy V. Ryaboy commented on PIG-2886:
----------------------------------------

Ok I'll take a look tomorrow. Going rogue for a bit, disconnecting :).
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440367#comment-13440367 ] 

Bill Graham commented on PIG-2886:
----------------------------------

Only a subset of tests run during test-commit. test will run all of them (and take a while).  Also annotations are used to indicate that that class contains tests.

You can do this to test just one test:

{noformat}
ant clean test -Dtestcase=TestHBaseStorage
{noformat}

                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

    Attachment: PIG-2886-3.patch

Fixed type-os
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch, PIG-2886-3.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439191#comment-13439191 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Submitted fix with pull request from tmalaska/pig github.  Let me know what I need to get this into PIG.  My client would like to use HBaseStorage from PIG instead of a fixed version I gave them in a jar.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

    Attachment: PIG-2886-2.patch

Cleaned things up.  Change maxTimeRange and minTimeRange to maxTimestamp and minTimestamp.  Plus I added a timestamp option.  Along with unit tests.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446839#comment-13446839 ] 

Dmitriy V. Ryaboy commented on PIG-2886:
----------------------------------------

Let's keep the issue of multiple versions separate -- it's not entirely clear how those should be returned (a bag?)
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440270#comment-13440270 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Question.  

I added the test cases and ran the following command and I noticed the TestHBaseStore doesn't run.

ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean jar test-commit

I'm thinking that because TestHBaseStorage doesn't extend TestCase, also no other classes call TestHBaseStorage.

So the question is: Is there a design reason why TestHBaseStorage is not running when running unit test?  Is it ok if I make TestHBaseStorage run during unit tests?

                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443640#comment-13443640 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Great thanks.  Got it.  

I was first doing in on my local (no Hadoop) and it would freezy.  Then I tried it on CDH4 and it didn't work either.  I will try it on CDH3 tonight.

By the way do you see anything else in the code I should add or clean up.

I should have time to work on it tonight.

Ted Malaska  
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446169#comment-13446169 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Thx will work on it now.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443559#comment-13443559 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Thanks Bill,

I tried running TestHBaseStorage and it freezes on SetUp.  

>public void setUp() throws Exception {
>        // This is needed by Pig
>    	
>        cluster = MiniCluster.buildCluster();
>        conf = cluster.getConfiguration();
>
>        util = new HBaseTestingUtility(conf);
>        util.startMiniZKCluster();
>        util.startMiniHBaseCluster(1, 1);
>    }

Just wondering if you know what I'm missing to make this work.  Hopefully I will get time in the next couple of days to research this.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446700#comment-13446700 ] 

Ted Malaska commented on PIG-2886:
----------------------------------

Hey Dmitriy,

I made the changes you requested and added in the setTimestamp() option.  

I would love to do the # of versions change, but can I do that in another jira issue so I can have this one closed. :)

Thanks
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446841#comment-13446841 ] 

Dmitriy V. Ryaboy commented on PIG-2886:
----------------------------------------

Ted, let me know if you want me to review, k? Wasn't clear to me from the last message if you are in a 'done' state or if you are just posting intermediate work right now.
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy resolved PIG-2886.
------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.9.3)
                   0.11

Applied to trunk.

As there any need to apply this to 0.10 branch?

Not sure we'll release a 0.10.1 branch at this point..
                
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.11
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch, PIG-2886-3.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy reassigned PIG-2886:
--------------------------------------

    Assignee: Ted Malaska
    
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2886) Add Scan TimeRange to HBaseStorage

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2886:
-----------------------------

                 Tags: HBaseStorage timeRange timeStamp
        Fix Version/s: 0.9.3
               Labels: newbie  (was: )
    Affects Version/s: 0.9.0
                       0.9.1
                       0.9.2
         Release Note: Added the ability to set HBase Scan's maxTimestamp, minTimestamp and timestamp in HBaseStorage.  I also added unit tests.
               Status: Patch Available  (was: Open)
    
> Add Scan TimeRange to HBaseStorage 
> -----------------------------------
>
>                 Key: PIG-2886
>                 URL: https://issues.apache.org/jira/browse/PIG-2886
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.9.1, 0.9.0
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.9.3
>
>         Attachments: PIG-2886-0.patch, PIG-2886-1.patch, PIG-2886-2.patch
>
>
> I have a client that wants to use pig.  They are using MR now.  They can't use PIG right now because they only want to fetch the last day's worth of data in HBase.  A filter with time range would require reading all the HStore files.  If we hold major compaction until after the fetch and use Scan Time Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira