You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hawq.apache.org by sansanichfb <gi...@git.apache.org> on 2017/04/29 03:43:21 UTC

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

GitHub user sansanichfb opened a pull request:

    https://github.com/apache/incubator-hawq/pull/1225

    HAWQ-1446: Introduce vectorized profile for ORC.

    Work still in progress, want to get earlier feedback.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sansanichfb/incubator-hawq HAWQ-1446

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/1225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1225
    
----
commit 9fb7929120910163e30043b4fd2ebd000f869b4c
Author: Oleksandr Diachenko <od...@pivotal.io>
Date:   2017-04-18T21:38:45Z

    [#143733171] Added vectorized accessor and new profile.

commit b65e0e25f6a0520af9fc84ffe71d340c3c896948
Author: Oleksandr Diachenko <od...@pivotal.io>
Date:   2017-04-21T08:27:05Z

    [#143192433] Added batch resolver.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131080
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    --- End diff --
    
    call soi.getAllStructFieldRefs() outside of for loop


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/1225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by shivzone <gi...@git.apache.org>.

Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118339930
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    +            if (oi.getCategory() == Category.PRIMITIVE) {
    +                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
    +                resolvePrimitiveColumn(columnIndex, oi, vectorizedBatch);
    +            } else {
    +                throw new UnsupportedTypeException("Unable to resolve column index:" +  columnIndex + ". Only primitive types are supported.");
    +            }
    +        }
    +
    +        return resolvedBatch;
    +    }
    +
    +    private void resolvePrimitiveColumn(int columnIndex, ObjectInspector oi, VectorizedRowBatch vectorizedBatch) {
    +
    +        OneField field = null;
    +        Writable writableObject = null;
    +        /* process all rows from current batch for given column */
    +        for (int rowIndex = 0; rowIndex < vectorizedBatch.size; rowIndex++) {
    --- End diff --
    
    Since we do know that the whole all the writableObjects are of the same type, why don't we simply extract all values in the column into a Writable [] and resolve the entire writable array at once using just one switch case to determine the field type and resolve each writable value in one go ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129835
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    +
    +    protected RecordReader vrr;
    --- End diff --
    
    why protected, any child class is using it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by shivzone <gi...@git.apache.org>.

Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118332954
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    +            if (oi.getCategory() == Category.PRIMITIVE) {
    +                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
    +                resolvePrimitiveColumn(columnIndex, oi, vectorizedBatch);
    +            } else {
    +                throw new UnsupportedTypeException("Unable to resolve column index:" +  columnIndex + ". Only primitive types are supported.");
    --- End diff --
    
    Can't we catch this error upfront to check if have any non primitive type column in the schema when we use BatchResolver instead ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118808410
  
    --- Diff: pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/utilities/ProfileFactoryTest.java ---
    @@ -34,31 +34,31 @@
         public void get() throws Exception {
     
             // For TextInputFormat when table has no complex types, HiveText profile should be used
    -        String profileName = ProfileFactory.get(new TextInputFormat(), false);
    +        String profileName = ProfileFactory.get(new TextInputFormat(), false, null);
    --- End diff --
    
    Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129209
  
    --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/Utilities.java ---
    @@ -234,4 +235,15 @@ public static boolean useStats(ReadAccessor accessor, InputData inputData) {
                 return false;
             }
         }
    +
    +    public static boolean useVectorization(InputData inputData) {
    +        boolean isVectorizedResolver = false;
    +        try {
    +            isVectorizedResolver = ArrayUtils.contains(Class.forName(inputData.getResolver()).getInterfaces(), ReadVectorizedResolver.class);
    +        } catch (ClassNotFoundException e) {
    +            LOG.error("Unable to load resolver class: " + e.getMessage());
    +            return false;
    --- End diff --
    
    no need for this line, it will return at the end of the function with default false value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118130921
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    --- End diff --
    
    probably not very efficient, it'd be better to have a template empty row that can be clone() using underlying Arrays.copy() or similar methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118606404
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    +            if (oi.getCategory() == Category.PRIMITIVE) {
    +                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
    +                resolvePrimitiveColumn(columnIndex, oi, vectorizedBatch);
    +            } else {
    +                throw new UnsupportedTypeException("Unable to resolve column index:" +  columnIndex + ". Only primitive types are supported.");
    --- End diff --
    
    This is the very first place we are starting to process columns I don't think w should double validate them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118599822
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    +
    +    protected RecordReader vrr;
    +    private int batchIndex;
    +    private VectorizedRowBatch batch;
    +
    +    public HiveORCBatchAccessor(InputData input) throws Exception {
    +        super(input);
    +    }
    +
    +    @Override
    +    public boolean openForRead() throws Exception {
    +        Reader reader = HiveUtilities.getOrcReader(inputData);
    +        Options options = new Options();
    +        addColumns(options);
    +        addFragments(options);
    +        vrr = reader.rowsOptions(options);
    +        return vrr.hasNext();
    +    }
    +
    +    /**
    +     * File might have multiple splits, so this method restricts
    +     * reader to one split.
    +     * @param options reader options to modify
    +     */
    +    private void addFragments(Options options) {
    +        FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
    +        options.range(fileSplit.getStart(), fileSplit.getLength());
    +    }
    +
    +    /**
    +     * Reads next batch for current fragment.
    +     * @return next batch in OneRow format, key is a batch number, data is a batch
    +     */
    +    @Override
    +    public OneRow readNextObject() throws IOException {
    +        if (vrr.hasNext()) {
    +            batch = vrr.nextBatch(batch);
    +            batchIndex++;
    +            return new OneRow(new LongWritable(batchIndex), batch);
    +        } else {
    +            //All batches are exhausted
    +            return null;
    +        }
    +    }
    +
    +    /**
    +     * This method updated reader optionst to include projected columns only.
    +     * @param options reader options to modify
    +     * @throws Exception
    +     */
    +    private void addColumns(Options options) throws Exception {
    +        boolean[] includeColumns = new boolean[inputData.getColumns() + 1];
    --- End diff --
    
    That's the way which ORC batch API expects this parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118601254
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    --- End diff --
    
    Sure, updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118130293
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    +
    +    protected RecordReader vrr;
    +    private int batchIndex;
    +    private VectorizedRowBatch batch;
    +
    +    public HiveORCBatchAccessor(InputData input) throws Exception {
    +        super(input);
    +    }
    +
    +    @Override
    +    public boolean openForRead() throws Exception {
    +        Reader reader = HiveUtilities.getOrcReader(inputData);
    +        Options options = new Options();
    +        addColumns(options);
    +        addFragments(options);
    +        vrr = reader.rowsOptions(options);
    +        return vrr.hasNext();
    +    }
    +
    +    /**
    +     * File might have multiple splits, so this method restricts
    +     * reader to one split.
    +     * @param options reader options to modify
    +     */
    +    private void addFragments(Options options) {
    +        FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
    +        options.range(fileSplit.getStart(), fileSplit.getLength());
    +    }
    +
    +    /**
    +     * Reads next batch for current fragment.
    +     * @return next batch in OneRow format, key is a batch number, data is a batch
    +     */
    +    @Override
    +    public OneRow readNextObject() throws IOException {
    +        if (vrr.hasNext()) {
    +            batch = vrr.nextBatch(batch);
    +            batchIndex++;
    +            return new OneRow(new LongWritable(batchIndex), batch);
    +        } else {
    +            //All batches are exhausted
    +            return null;
    +        }
    +    }
    +
    +    /**
    +     * This method updated reader optionst to include projected columns only.
    +     * @param options reader options to modify
    +     * @throws Exception
    +     */
    +    private void addColumns(Options options) throws Exception {
    +        boolean[] includeColumns = new boolean[inputData.getColumns() + 1];
    --- End diff --
    
    probably not possible to change now, but bitmaps would be more efficient here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118601231
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    +
    +    protected RecordReader vrr;
    +    private int batchIndex;
    +    private VectorizedRowBatch batch;
    +
    +    public HiveORCBatchAccessor(InputData input) throws Exception {
    +        super(input);
    +    }
    +
    +    @Override
    +    public boolean openForRead() throws Exception {
    +        Reader reader = HiveUtilities.getOrcReader(inputData);
    +        Options options = new Options();
    +        addColumns(options);
    +        addFragments(options);
    +        vrr = reader.rowsOptions(options);
    +        return vrr.hasNext();
    +    }
    +
    +    /**
    +     * File might have multiple splits, so this method restricts
    +     * reader to one split.
    +     * @param options reader options to modify
    +     */
    +    private void addFragments(Options options) {
    +        FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
    +        options.range(fileSplit.getStart(), fileSplit.getLength());
    +    }
    +
    +    /**
    +     * Reads next batch for current fragment.
    +     * @return next batch in OneRow format, key is a batch number, data is a batch
    +     */
    +    @Override
    +    public OneRow readNextObject() throws IOException {
    +        if (vrr.hasNext()) {
    +            batch = vrr.nextBatch(batch);
    +            batchIndex++;
    +            return new OneRow(new LongWritable(batchIndex), batch);
    +        } else {
    +            //All batches are exhausted
    +            return null;
    +        }
    +    }
    +
    +    /**
    +     * This method updated reader optionst to include projected columns only.
    --- End diff --
    
    Thanks, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by shivzone <gi...@git.apache.org>.

Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118358798
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java ---
    @@ -0,0 +1,126 @@
    +package org.apache.hawq.pxf.service;
    --- End diff --
    
    ReadVectorizedBridge looks very similar to ReadBridge except for getNext() function. Please refactor both classes to avoid duplication


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118807905
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/BridgeOutputBuilder.java ---
    @@ -137,6 +137,18 @@ public Writable getErrorOutput(Exception ex) throws Exception {
             return outputList;
         }
     
    +    public LinkedList<Writable> makeVectorizedOutput(List<List<OneField>> recordsBatch) throws BadRecordException {
    +        outputList.clear();
    +        for (List<OneField> record : recordsBatch) {
    --- End diff --
    
    no null checks necessary for recordsBatch and record ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118135496
  
    --- Diff: pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/Utilities.java ---
    @@ -234,4 +235,15 @@ public static boolean useStats(ReadAccessor accessor, InputData inputData) {
                 return false;
             }
         }
    +
    +    public static boolean useVectorization(InputData inputData) {
    +        boolean isVectorizedResolver = false;
    +        try {
    +            isVectorizedResolver = ArrayUtils.contains(Class.forName(inputData.getResolver()).getInterfaces(), ReadVectorizedResolver.class);
    +        } catch (ClassNotFoundException e) {
    +            LOG.error("Unable to load resolver class: " + e.getMessage());
    +            return false;
    --- End diff --
    
    Sure, thanks



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118602026
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    --- End diff --
    
    Thanks, updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129347
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java ---
    @@ -289,7 +289,7 @@ private void fetchMetaData(HiveTablePartition tablePartition, boolean hasComplex
             if (inputData.getProfile() != null) {
                 // evaluate optimal profile based on file format if profile was explicitly specified in url
                 // if user passed accessor+fragmenter+resolver - use them
    -            profile = ProfileFactory.get(fformat, hasComplexTypes);
    +            profile = ProfileFactory.get(fformat, hasComplexTypes, inputData.getProfile());
    --- End diff --
    
    getProfile() is called twice (in if statement and here, its better to call once and then evaluate and reuse the variable)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129564
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    +
    +    protected RecordReader vrr;
    +    private int batchIndex;
    +    private VectorizedRowBatch batch;
    +
    +    public HiveORCBatchAccessor(InputData input) throws Exception {
    +        super(input);
    +    }
    +
    +    @Override
    +    public boolean openForRead() throws Exception {
    +        Reader reader = HiveUtilities.getOrcReader(inputData);
    +        Options options = new Options();
    +        addColumns(options);
    +        addFragments(options);
    +        vrr = reader.rowsOptions(options);
    +        return vrr.hasNext();
    +    }
    +
    +    /**
    +     * File might have multiple splits, so this method restricts
    +     * reader to one split.
    +     * @param options reader options to modify
    +     */
    +    private void addFragments(Options options) {
    +        FileSplit fileSplit = HdfsUtilities.parseFileSplit(inputData);
    +        options.range(fileSplit.getStart(), fileSplit.getLength());
    +    }
    +
    +    /**
    +     * Reads next batch for current fragment.
    +     * @return next batch in OneRow format, key is a batch number, data is a batch
    +     */
    +    @Override
    +    public OneRow readNextObject() throws IOException {
    +        if (vrr.hasNext()) {
    +            batch = vrr.nextBatch(batch);
    +            batchIndex++;
    +            return new OneRow(new LongWritable(batchIndex), batch);
    +        } else {
    +            //All batches are exhausted
    +            return null;
    +        }
    +    }
    +
    +    /**
    +     * This method updated reader optionst to include projected columns only.
    --- End diff --
    
    typo "optionst"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132313
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java ---
    @@ -0,0 +1,126 @@
    +package org.apache.hawq.pxf.service;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.DataInputStream;
    +import java.io.IOException;
    +import java.util.LinkedList;
    +import java.util.List;
    +
    +import org.apache.hawq.pxf.api.BadRecordException;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.service.io.Writable;
    +import org.apache.hawq.pxf.service.utilities.ProtocolData;
    +
    +public class ReadVectorizedBridge implements Bridge {
    +
    +    ReadAccessor fileAccessor = null;
    --- End diff --
    
    not private members ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129472
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveMetadataFetcher.java ---
    @@ -136,7 +136,7 @@ public HiveMetadataFetcher(InputData md) {
         private OutputFormat getOutputFormat(String inputFormat, boolean hasComplexTypes) throws Exception {
             OutputFormat outputFormat = null;
             InputFormat<?, ?> fformat = HiveDataFragmenter.makeInputFormat(inputFormat, jobConf);
    -        String profile = ProfileFactory.get(fformat, hasComplexTypes);
    +        String profile = ProfileFactory.get(fformat, hasComplexTypes, null);
    --- End diff --
    
    passing explicit null params should be avoided, if possible, override the function if more/less params are desired.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132215
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadBridge.java ---
    @@ -149,9 +149,10 @@ public static ReadAccessor getFileAccessor(InputData inputData)
                     inputData.getAccessor(), inputData);
         }
     
    -    public static ReadResolver getFieldsResolver(InputData inputData)
    +    @SuppressWarnings("unchecked")
    --- End diff --
    
    ouch, can you make Utilities.createAnyInstance templetized instead ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118129724
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java ---
    @@ -0,0 +1,115 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.mapred.*;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.api.utilities.Utilities;
    +import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.hive.ql.io.orc.OrcFile;
    +import org.apache.hadoop.hive.ql.io.orc.Reader;
    +import org.apache.hadoop.hive.ql.io.orc.Reader.Options;
    +import org.apache.hadoop.hive.ql.io.orc.RecordReader;
    +import org.apache.hadoop.io.LongWritable;
    +
    +/**
    + * Accessor class which reads data in batches.
    + * One batch is 1024 rows of all projected columns
    + *
    + */
    +public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {
    --- End diff --
    
    would it be useful if it extended the HiveORCAccessor and overwrite functions ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118807815
  
    --- Diff: pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/utilities/ProfileFactoryTest.java ---
    @@ -34,31 +34,31 @@
         public void get() throws Exception {
     
             // For TextInputFormat when table has no complex types, HiveText profile should be used
    -        String profileName = ProfileFactory.get(new TextInputFormat(), false);
    +        String profileName = ProfileFactory.get(new TextInputFormat(), false, null);
    --- End diff --
    
    can revert back these changes now that the function with 2 arguments is back, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118130590
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    --- End diff --
    
    call inputData.getColumns() once outside for loop if the data returned is always the same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118788222
  
    --- Diff: pxf/pxf-service/src/main/resources/pxf-profiles-default.xml ---
    @@ -101,6 +101,17 @@ under the License.
                 <outputFormat>org.apache.hawq.pxf.service.io.GPDBWritable</outputFormat>
             </plugins>
         </profile>
    +        <profile>
    +        <name>HiveVectorizedORC</name>
    --- End diff --
    
    Renamed all classes to use "vectorized"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132761
  
    --- Diff: pxf/pxf-service/src/main/resources/pxf-profiles-default.xml ---
    @@ -101,6 +101,17 @@ under the License.
                 <outputFormat>org.apache.hawq.pxf.service.io.GPDBWritable</outputFormat>
             </plugins>
         </profile>
    +        <profile>
    +        <name>HiveVectorizedORC</name>
    --- End diff --
    
    seems like "batch" and "vectorized" are used interchangeably, should we use just one term ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131278
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    +            if (oi.getCategory() == Category.PRIMITIVE) {
    +                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
    +                resolvePrimitiveColumn(columnIndex, oi, vectorizedBatch);
    +            } else {
    +                throw new UnsupportedTypeException("Unable to resolve column index:" +  columnIndex + ". Only primitive types are supported.");
    +            }
    +        }
    +
    +        return resolvedBatch;
    +    }
    +
    +    private void resolvePrimitiveColumn(int columnIndex, ObjectInspector oi, VectorizedRowBatch vectorizedBatch) {
    +
    +        OneField field = null;
    +        Writable writableObject = null;
    +        /* process all rows from current batch for given column */
    +        for (int rowIndex = 0; rowIndex < vectorizedBatch.size; rowIndex++) {
    +            if (vectorizedBatch.cols[columnIndex] != null && !vectorizedBatch.cols[columnIndex].isNull[rowIndex]) {
    +                writableObject = vectorizedBatch.cols[columnIndex].getWritableObject(rowIndex);
    --- End diff --
    
    vectorizedBatch.cols[columnIndex] is used 3 times, substitute with variable ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118132449
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java ---
    @@ -0,0 +1,126 @@
    +package org.apache.hawq.pxf.service;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import java.io.DataInputStream;
    +import java.io.IOException;
    +import java.util.LinkedList;
    +import java.util.List;
    +
    +import org.apache.hawq.pxf.api.BadRecordException;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadAccessor;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.service.io.Writable;
    +import org.apache.hawq.pxf.service.utilities.ProtocolData;
    +
    +public class ReadVectorizedBridge implements Bridge {
    +
    +    ReadAccessor fileAccessor = null;
    +    ReadVectorizedResolver fieldsResolver;
    +    BridgeOutputBuilder outputBuilder = null;
    +    LinkedList<Writable> outputQueue = null;
    +
    +    public ReadVectorizedBridge(ProtocolData protData) throws Exception {
    +        outputBuilder = new BridgeOutputBuilder(protData);
    +        outputQueue = new LinkedList<Writable>();
    +        fileAccessor = ReadBridge.getFileAccessor(protData);
    +        fieldsResolver = ReadBridge.getFieldsResolver(protData);
    +    }
    +
    +    @Override
    +    public Writable getNext() throws Exception {
    +        Writable output = null;
    +        OneRow batch = null;
    +
    +        if (!outputQueue.isEmpty()) {
    +            return outputQueue.pop();
    +        }
    +
    +        try {
    +            while (outputQueue.isEmpty()) {
    +                batch = fileAccessor.readNextObject();
    +                if (batch == null) {
    +                    output = outputBuilder.getPartialLine();
    +                    if (output != null) {
    +                        //LOG.warn("A partial record in the end of the fragment");
    --- End diff --
    
    remove commented lines ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by denalex <gi...@git.apache.org>.

Github user denalex commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118131006
  
    --- Diff: pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java ---
    @@ -0,0 +1,257 @@
    +package org.apache.hawq.pxf.plugins.hive;
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +import static org.apache.hawq.pxf.api.io.DataType.BIGINT;
    +import static org.apache.hawq.pxf.api.io.DataType.BOOLEAN;
    +import static org.apache.hawq.pxf.api.io.DataType.BPCHAR;
    +import static org.apache.hawq.pxf.api.io.DataType.BYTEA;
    +import static org.apache.hawq.pxf.api.io.DataType.DATE;
    +import static org.apache.hawq.pxf.api.io.DataType.FLOAT8;
    +import static org.apache.hawq.pxf.api.io.DataType.INTEGER;
    +import static org.apache.hawq.pxf.api.io.DataType.NUMERIC;
    +import static org.apache.hawq.pxf.api.io.DataType.REAL;
    +import static org.apache.hawq.pxf.api.io.DataType.SMALLINT;
    +import static org.apache.hawq.pxf.api.io.DataType.TEXT;
    +import static org.apache.hawq.pxf.api.io.DataType.TIMESTAMP;
    +import static org.apache.hawq.pxf.api.io.DataType.VARCHAR;
    +
    +import java.math.BigDecimal;
    +import java.util.ArrayList;
    +import java.util.Calendar;
    +import java.util.List;
    +import java.sql.Timestamp;
    +import java.sql.Date;
    +
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.hive.common.type.HiveDecimal;
    +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
    +import org.apache.hadoop.io.Writable;
    +import org.apache.hadoop.io.LongWritable;
    +import org.apache.hadoop.io.DoubleWritable;
    +import org.apache.hadoop.io.FloatWritable;
    +import org.apache.hadoop.io.Text;
    +import org.apache.hawq.pxf.api.OneField;
    +import org.apache.hawq.pxf.api.OneRow;
    +import org.apache.hawq.pxf.api.ReadVectorizedResolver;
    +import org.apache.hawq.pxf.api.UnsupportedTypeException;
    +import org.apache.hawq.pxf.api.io.DataType;
    +import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
    +import org.apache.hawq.pxf.api.utilities.InputData;
    +import org.apache.hawq.pxf.api.utilities.Plugin;
    +import org.apache.hawq.pxf.plugins.hive.utilities.HiveUtilities;
    +import org.apache.hadoop.hive.serde2.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
    +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    +import org.apache.hadoop.hive.ql.exec.vector.*;
    +
    +@SuppressWarnings("deprecation")
    +public class HiveORCBatchResolver extends Plugin implements ReadVectorizedResolver {
    +
    +    private static final Log LOG = LogFactory.getLog(HiveORCBatchResolver.class);
    +
    +    private List<List<OneField>> resolvedBatch;
    +    private StructObjectInspector soi;
    +
    +    public HiveORCBatchResolver(InputData input) throws Exception {
    +        super(input);
    +        try {
    +            soi = (StructObjectInspector) HiveUtilities.getOrcReader(input).getObjectInspector();
    +        } catch (Exception e) {
    +            LOG.error("Unable to create an object inspector.");
    +            throw e;
    +        }
    +    }
    +
    +    @Override
    +    public List<List<OneField>> getFieldsForBatch(OneRow batch) {
    +
    +        Writable writableObject = null;
    +        Object fieldValue = null;
    +        VectorizedRowBatch vectorizedBatch = (VectorizedRowBatch) batch.getData();
    +
    +        // Allocate empty result set
    +        resolvedBatch = new ArrayList<List<OneField>>(vectorizedBatch.size);
    +        for (int i = 0; i < vectorizedBatch.size; i++) {
    +            ArrayList<OneField> row = new ArrayList<OneField>(inputData.getColumns());
    +            resolvedBatch.add(row);
    +            for (int j = 0; j < inputData.getColumns(); j++) {
    +                row.add(null);
    +            }
    +        }
    +
    +        /* process all columns*/
    +        for (int columnIndex = 0; columnIndex < vectorizedBatch.numCols; columnIndex++) {
    +            ObjectInspector oi = soi.getAllStructFieldRefs().get(columnIndex).getFieldObjectInspector();
    +            if (oi.getCategory() == Category.PRIMITIVE) {
    +                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
    --- End diff --
    
    bracket alignment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #1225: HAWQ-1446: Introduce vectorized profile f...

Posted by sansanichfb <gi...@git.apache.org>.

Github user sansanichfb commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1225#discussion_r118766404
  
    --- Diff: pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java ---
    @@ -0,0 +1,126 @@
    +package org.apache.hawq.pxf.service;
    --- End diff --
    
    Makes sense, extended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---