You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by navis <gi...@git.apache.org> on 2015/10/22 04:38:17 UTC

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

GitHub user navis opened a pull request:

    https://github.com/apache/phoenix/pull/124

    PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame

    from jira description
    
    >When loading a Spark dataframe from a Phoenix table with a 'DECIMAL' type, the underlying precision and scale aren't carried forward to Spark.
    
    >The Spark catalyst schema converter should load these from the underlying column. These appear to be exposed in the ResultSetMetaData, but if there was a way to expose these somehow through ColumnInfo, it would be cleaner.
    
    >I'm not sure if Pig has the same issues or not, but I suspect it may.
    
    
    It seemed enough just for current usage in spark-interagation. But in long term, PDataType should contain meta information like maxLength or precision, etc.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/navis/phoenix PHOENIX-2288

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #124
    
----
commit 25fa16bcef4e4fcbc9fcf07d935839ed563a9b52
Author: navis.ryu <na...@apache.org>
Date:   2015-10-22T02:32:30Z

    PHOENIX-2288 Phoenix-Spark: PDecimal precision and scale aren't carried through to Spark DataFrame

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by navis <gi...@git.apache.org>.
Github user navis commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/124#discussion_r42825749
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java ---
    @@ -459,19 +459,30 @@ public static ColumnInfo getColumnInfo(PTable table, String columnName) throws S
             return getColumnInfo(pColumn);
         }
     
    -   /**
    +    /**
          * Constructs a column info for the supplied pColumn
          * @param pColumn
          * @return columnInfo
          * @throws SQLException if the parameter is null.
          */
         public static ColumnInfo getColumnInfo(PColumn pColumn) throws SQLException {
    -        if (pColumn==null) {
    +        if (pColumn == null) {
                 throw new SQLException("pColumn must not be null.");
             }
             int sqlType = pColumn.getDataType().getSqlType();
    -        ColumnInfo columnInfo = new ColumnInfo(pColumn.toString(),sqlType);
    -        return columnInfo;
    +        if (pColumn.getMaxLength() == null) {
    +            return new ColumnInfo(pColumn.toString(), sqlType);
    +        }
    +        if (sqlType == Types.CHAR || sqlType == Types.VARCHAR) {
    --- End diff --
    
    @JamesRTaylor How about to move the logic above into PColumn? Then we can access full information in it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/124#discussion_r42823983
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java ---
    @@ -459,19 +459,30 @@ public static ColumnInfo getColumnInfo(PTable table, String columnName) throws S
             return getColumnInfo(pColumn);
         }
     
    -   /**
    +    /**
          * Constructs a column info for the supplied pColumn
          * @param pColumn
          * @return columnInfo
          * @throws SQLException if the parameter is null.
          */
         public static ColumnInfo getColumnInfo(PColumn pColumn) throws SQLException {
    -        if (pColumn==null) {
    +        if (pColumn == null) {
                 throw new SQLException("pColumn must not be null.");
             }
             int sqlType = pColumn.getDataType().getSqlType();
    -        ColumnInfo columnInfo = new ColumnInfo(pColumn.toString(),sqlType);
    -        return columnInfo;
    +        if (pColumn.getMaxLength() == null) {
    +            return new ColumnInfo(pColumn.toString(), sqlType);
    +        }
    +        if (sqlType == Types.CHAR || sqlType == Types.VARCHAR) {
    --- End diff --
    
    Rather than check for particular types, it'd be more general to check for null like this:
    
        Integer maxLength = pColumn.getMaxLength();
        Integer scale = pColumn.getScale();
        return new ColumnInfo(pColumn.toString(), sqlType, maxLength, scale);
    
    Then make sure that ColumnInfo handles a null maxLength and scale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/124#discussion_r42826328
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java ---
    @@ -459,19 +459,30 @@ public static ColumnInfo getColumnInfo(PTable table, String columnName) throws S
             return getColumnInfo(pColumn);
         }
     
    -   /**
    +    /**
          * Constructs a column info for the supplied pColumn
          * @param pColumn
          * @return columnInfo
          * @throws SQLException if the parameter is null.
          */
         public static ColumnInfo getColumnInfo(PColumn pColumn) throws SQLException {
    -        if (pColumn==null) {
    +        if (pColumn == null) {
                 throw new SQLException("pColumn must not be null.");
             }
             int sqlType = pColumn.getDataType().getSqlType();
    -        ColumnInfo columnInfo = new ColumnInfo(pColumn.toString(),sqlType);
    -        return columnInfo;
    +        if (pColumn.getMaxLength() == null) {
    +            return new ColumnInfo(pColumn.toString(), sqlType);
    +        }
    +        if (sqlType == Types.CHAR || sqlType == Types.VARCHAR) {
    --- End diff --
    
    ColumnInfo is a kind of lightweight transport class solely for passing in the necessary column metadata for the MR and Spark integration to run. It's passed in through the config so it has some simple to/from string methods - this prevents us from having to lookup the metadata from Phoenix metadata using the regular JDBC metadata APIs (which would be another option). Having this ColumnInfo class was deemed slightly easier.
    
    PColumn has more information than we need and it'd be best to keep this as an internal/private class as much as possible. It's the object representation of our column metadata.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by jmahonin <gi...@git.apache.org>.
Github user jmahonin commented on the pull request:

    https://github.com/apache/phoenix/pull/124#issuecomment-151491230
  
    How's this look, @JamesRTaylor / @ravimagham ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by JamesRTaylor <gi...@git.apache.org>.
Github user JamesRTaylor commented on the pull request:

    https://github.com/apache/phoenix/pull/124#issuecomment-150397877
  
    Thanks for the pull request, @navis. A couple of minor comments, but overall it looks great. FYI, our PDataType class is stateless (it was an enum originally), so we currently access maxLength/precision and scale through the PDatum interface (from which PColumn and Expression are derived). Now that PDataType is no longer an enum, it might be nice to allow instantiation with maxLength and scale provided at construction time. Please file a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] phoenix pull request: PHOENIX-2288 Phoenix-Spark: PDecimal precisi...

Posted by jmahonin <gi...@git.apache.org>.
Github user jmahonin commented on the pull request:

    https://github.com/apache/phoenix/pull/124#issuecomment-150211117
  
    This looks great @navis 
    
    The Spark portion looks fine. I'll leave the updates to ColumnInfo for @ravimagham @JamesRTaylor et. al. to review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---