You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by hyunsik <gi...@git.apache.org> on 2014/10/06 07:34:33 UTC

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

GitHub user hyunsik opened a pull request:

    https://github.com/apache/tajo/pull/181

    TAJO-1095: Implement Json file scanner.

    This is still on-going work. I'll improve its unit test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hyunsik/tajo TAJO-1095

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #181
    
----
commit f0aeb800b3879106d78ed91561c958e041d43daa
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-10-06T05:32:22Z

    TAJO-1095: Implement Json file scanner.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/181#issuecomment-64865141
  
    +1
    Looks great to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/181#issuecomment-64839655
  
    updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r18688459
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java ---
    @@ -249,36 +249,45 @@ public Expr visitInsert(Context context, Stack<Expr> stack, Insert expr) throws
     
         if (child != null && child.getType() == OpType.Projection) {
           Projection projection = (Projection) child;
    -      int projectColumnNum = projection.getNamedExprs().length;
     
    -      if (expr.hasTargetColumns()) {
    -        int targetColumnNum = expr.getTargetColumns().length;
    +      boolean includeAsterisk = true;
     
    -        if (targetColumnNum > projectColumnNum)  {
    -          context.state.addVerification("INSERT has more target columns than expressions");
    -        } else if (targetColumnNum < projectColumnNum) {
    -          context.state.addVerification("INSERT has more expressions than target columns");
    -        }
    -      } else {
    -        if (expr.hasTableName()) {
    -          String qualifiedName = expr.getTableName();
    -          if (TajoConstants.EMPTY_STRING.equals(CatalogUtil.extractQualifier(expr.getTableName()))) {
    -            qualifiedName = CatalogUtil.buildFQName(context.queryContext.getCurrentDatabase(),
    -                expr.getTableName());
    -          }
    +      for (NamedExpr namedExpr : projection.getNamedExprs()) {
    +        includeAsterisk |= namedExpr.getExpr().getType() != OpType.Asterisk;
    --- End diff --
    
    It seems that 'includeAsterisk' is always true, because the initial value is true, and some boolean values are ORed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/181#issuecomment-58613139
  
    Hi Hyunsik.
    Thanks for your contribution. This is a truly necessary feature.
    JSonScanner looks good, and a relevant test was successfully passed.
    However, in PreLogicalPlanVerifier, I suspect that some lines can occur potential problems. I left some comments related to them.
    In addition, TestTablePartitions.testColumnPartitionedTableWithSmallerExpressions1() and TestTablePartitions.testColumnPartitionedTableWithSmallerExpressions2() fails. Please check them.
    
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r18688360
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/optimizer/eval/EvalTreeOptimizer.java ---
    @@ -70,6 +70,7 @@ public EvalNode optimize(LogicalPlanner.PlanContext context, EvalNode node) {
     
         EvalNode optimized = node;
         for (EvalTreeOptimizationRule rule : rules) {
    +      LOG.info(node);
    --- End diff --
    
    Is this a necessary log?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r18688493
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java ---
    @@ -249,36 +249,45 @@ public Expr visitInsert(Context context, Stack<Expr> stack, Insert expr) throws
     
         if (child != null && child.getType() == OpType.Projection) {
           Projection projection = (Projection) child;
    -      int projectColumnNum = projection.getNamedExprs().length;
     
    -      if (expr.hasTargetColumns()) {
    -        int targetColumnNum = expr.getTargetColumns().length;
    +      boolean includeAsterisk = true;
     
    -        if (targetColumnNum > projectColumnNum)  {
    -          context.state.addVerification("INSERT has more target columns than expressions");
    -        } else if (targetColumnNum < projectColumnNum) {
    -          context.state.addVerification("INSERT has more expressions than target columns");
    -        }
    -      } else {
    -        if (expr.hasTableName()) {
    -          String qualifiedName = expr.getTableName();
    -          if (TajoConstants.EMPTY_STRING.equals(CatalogUtil.extractQualifier(expr.getTableName()))) {
    -            qualifiedName = CatalogUtil.buildFQName(context.queryContext.getCurrentDatabase(),
    -                expr.getTableName());
    -          }
    +      for (NamedExpr namedExpr : projection.getNamedExprs()) {
    +        includeAsterisk |= namedExpr.getExpr().getType() != OpType.Asterisk;
    +      }
    +
    +      if (!includeAsterisk) {
    --- End diff --
    
    Maybe we need to handle the case that the projection includes an asterisk. Will this case be handled in other issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r18688503
  
    --- Diff: tajo-core/src/test/java/org/apache/tajo/LocalTajoTestingUtility.java ---
    @@ -18,9 +18,11 @@
     
     package org.apache.tajo;
     
    +import com.google.common.base.Preconditions;
    --- End diff --
    
    Please remove unused imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r21014823
  
    --- Diff: tajo-storage/src/main/java/org/apache/tajo/storage/json/JsonLineDeserializer.java ---
    @@ -0,0 +1,222 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.tajo.storage.json;
    +
    +
    +import io.netty.buffer.ByteBuf;
    +import net.minidev.json.JSONArray;
    +import net.minidev.json.JSONObject;
    +import net.minidev.json.parser.JSONParser;
    +import org.apache.tajo.catalog.Schema;
    +import org.apache.tajo.catalog.SchemaUtil;
    +import org.apache.tajo.catalog.TableMeta;
    +import org.apache.tajo.common.TajoDataTypes.Type;
    +import org.apache.tajo.common.exception.NotImplementedException;
    +import org.apache.tajo.datum.DatumFactory;
    +import org.apache.tajo.datum.NullDatum;
    +import org.apache.tajo.datum.protobuf.ProtobufJsonFormat;
    +import org.apache.tajo.storage.Tuple;
    +import org.apache.tajo.storage.text.TextLineDeserializer;
    +
    +import java.io.IOException;
    +import java.util.Iterator;
    +
    +public class JsonLineDeserializer extends TextLineDeserializer {
    +  private static ProtobufJsonFormat protobufJsonFormat = ProtobufJsonFormat.getInstance();
    --- End diff --
    
    Could you remove this line ? It seems to unused code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/181#discussion_r21016228
  
    --- Diff: tajo-storage/src/main/java/org/apache/tajo/storage/json/JsonLineSerializer.java ---
    @@ -0,0 +1,131 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.tajo.storage.json;
    +
    +
    +import net.minidev.json.JSONObject;
    +import org.apache.tajo.catalog.Schema;
    +import org.apache.tajo.catalog.SchemaUtil;
    +import org.apache.tajo.catalog.TableMeta;
    +import org.apache.tajo.common.TajoDataTypes.Type;
    +import org.apache.tajo.common.exception.NotImplementedException;
    +import org.apache.tajo.datum.ProtobufDatum;
    +import org.apache.tajo.datum.protobuf.ProtobufJsonFormat;
    +import org.apache.tajo.storage.Tuple;
    +import org.apache.tajo.storage.text.TextLineSerDe;
    +import org.apache.tajo.storage.text.TextLineSerializer;
    +
    +import java.io.IOException;
    +import java.io.OutputStream;
    +
    +public class JsonLineSerializer extends TextLineSerializer {
    +  private static ProtobufJsonFormat protobufJsonFormat = ProtobufJsonFormat.getInstance();
    +
    +  private Type [] types;
    +  private String [] simpleNames;
    +  private int columnNum;
    +
    +
    +  public JsonLineSerializer(Schema schema, TableMeta meta) {
    +    super(schema, meta);
    +  }
    +
    +  @Override
    +  public void init() {
    +    types = SchemaUtil.toTypes(schema);
    +    simpleNames = SchemaUtil.toSimpleNames(schema);
    +    columnNum = schema.size();
    +  }
    +
    +  @Override
    +  public int serialize(OutputStream out, Tuple input) throws IOException {
    +    JSONObject jsonObject = new JSONObject();
    +
    +    for (int i = 0; i < columnNum; i++) {
    +      if (input.isNull(i)) {
    +        continue;
    +      }
    +
    +      String fieldName = simpleNames[i];
    +      Type type = types[i];
    +
    +      switch (type) {
    +
    +      case BOOLEAN:
    +        jsonObject.put(fieldName, input.getBool(i));
    +        break;
    +
    +      case INT1:
    +      case INT2:
    +        jsonObject.put(fieldName, input.getInt2(i));
    +        break;
    +
    +      case INT4:
    +        jsonObject.put(fieldName, input.getInt4(i));
    +        break;
    +
    +      case INT8:
    +        jsonObject.put(fieldName, input.getInt8(i));
    +        break;
    +
    +      case FLOAT4:
    +        jsonObject.put(fieldName, input.getFloat4(i));
    +        break;
    +
    +      case FLOAT8:
    +        jsonObject.put(fieldName, input.getFloat8(i));
    +        break;
    +
    +      case CHAR:
    +      case TEXT:
    +      case VARCHAR:
    +      case INET4:
    +      case TIMESTAMP:
    +      case DATE:
    +      case TIME:
    +      case INTERVAL:
    +        jsonObject.put(fieldName, input.getText(i));
    +        break;
    +
    +      case BIT:
    +      case BINARY:
    +      case BLOB:
    +      case VARBINARY:
    +        jsonObject.put(fieldName, input.getBytes(i));
    +        break;
    +
    +      case NULL_TYPE:
    +        break;
    +
    +      default:
    +        throw new NotImplementedException(types[i].name() + " is not supported.");
    +      }
    +    }
    +
    +    String jsonStr = jsonObject.toJSONString();
    +    byte [] jsonBytes = jsonStr.getBytes();
    --- End diff --
    
    Could you add charset to UTF8?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/181#issuecomment-64819022
  
    I rebased and reflected your comments. Also, I changed the patch to use the pluggable text line serde (https://issues.apache.org/jira/browse/TAJO-1209) which has been recently added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/181#issuecomment-64858020
  
    Thank you for your comments. I reflected your comments and rebased the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1095: Implement Json file scanner.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/181


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---