You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Carl Austin (JIRA)" <ji...@apache.org> on 2014/06/17 12:54:02 UTC
[jira] [Commented] (ACCUMULO-143) Accumulo Hive

    [ https://issues.apache.org/jira/browse/ACCUMULO-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033652#comment-14033652 ] 

Carl Austin commented on ACCUMULO-143:
--------------------------------------

I've got a forked version of this for my testing purposes and anything that doesn't need all columns read is slower than it needs to be (a factor of 5 times when only selecting a single column for example in my testing), so I've modified it to only fetch the columns needed. I can't easily create a patch due to how far I've changed things, but the necessary bit is:

In configure method:
{code}
            ASTNode node = driver.parse(conf.get("hive.query.string"));
            node = ParseUtils.findRootNonNullToken(node);
            findColumns(node, columns);
            Collection<Pair<Text, Text>> pairs = Lists.newArrayList();
            if (columns.size() > 0) {
                for (String col : columns) {
                    String[] pair = AccumuloHiveUtils.hiveToAccumulo(col, conf).split("\\|");
                    pairs.add(new Pair<Text, Text>(new Text(pair[0]), new Text(pair[1])));
                }
            } else {
                pairs = getPairCollection(colQualFamPairs, false);
            }
{code}

A new method:
{code}
    public void findColumns(ASTNode node, List<String> columns) {
        //TODO : This should be == HiveParser.TOK_TABLE_OR_COL not 784 but that doesn't actually seem to work in my case. This is a hacky fix and may not work for other versions of hive.
        if (node.getToken().getType() == 784) {
            columns.add(node.getChild(0).getText().toLowerCase());
        } else {
            if (node.getChildren() != null) {
                for (Node child : node.getChildren()) {
                    findColumns((ASTNode)child, columns);
                }
            }
        }
    }
{code}

Obviously this isn't perfect yet and it doesn't take into account things like count(1) which will not return any columns so it will fetch all still.

I've also added something that allows you to configure additional columns as a serde property when creating the table. I've done this so that columns used in iterators to calculate new columns, may not be mapped in the create statement otherwise, not fetched and thus those "calculated" columns will never work.

Let me know if you'd like any more info.

> Accumulo Hive
> -------------
>
>                 Key: ACCUMULO-143
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-143
>             Project: Accumulo
>          Issue Type: Task
>          Components: contrib
>    Affects Versions: 1.6.0
>            Reporter: Keith Turner
>         Attachments: ACCUMULO-143.patch
>
>
> Need to look into adding support for Accumulo to Hive



--
This message was sent by Atlassian JIRA
(v6.2#6252)