You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by twalthr <gi...@git.apache.org> on 2016/07/27 13:52:50 UTC

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/2304

    [FLINK-4268] [core] Add parsers for BigDecimal/BigInteger

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [x] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [x] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [x] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed
    
    This PR is similar to FLINK-4248. It adds parsers for the basic types `BigDecimal` and `BigInteger`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink FLINK-4268

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2304
    
----
commit 8e5a55372e30c70d00e84e22ff153d3526918782
Author: twalthr <tw...@apache.org>
Date:   2016-07-27T13:48:48Z

    [FLINK-4268] [core] Add a parsers for BigDecimal/BigInteger

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2304: [FLINK-4268] [core] Add parsers for BigDecimal/BigInteger

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/2304
  
    @fhueske I improved the big decimal parsing. Now less objects are created. I hope I didn't introduce new bugs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2304: [FLINK-4268] [core] Add parsers for BigDecimal/BigInteger

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2304
  
    Thanks @twalthr, PR looks good. 
    Only added a hint how String creations can be avoided.
    
    Good to merge otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2304#discussion_r79572690
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/BigDecParser.java ---
    @@ -55,9 +56,20 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, B
     			return -1;
     		}
     
    -		String str = new String(bytes, startPos, i - startPos);
     		try {
    -			this.result = new BigDecimal(str);
    +			final int length = i - startPos;
    +			if (reuse == null || reuse.length != length) {
    +				reuse = new char[length];
    +			}
    +			for (int j = 0; j < length; j++) {
    +				final byte b = bytes[startPos + j];
    +				if ((b < '0' || b > '9') && b != '-' && b != '+' && b != '.' && b != 'E' && b != 'e') {
    +					throw new NumberFormatException();
    +				}
    +				reuse[j] = (char) bytes[startPos + j];
    +			}
    +
    +			this.result = new BigDecimal(reuse);
    --- End diff --
    
    we can use the `BigDecimal(char[] in, int offset, int len)` and reuse the `reuse` object also if it is larger than then provided input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2304#discussion_r79573034
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/BigDecParser.java ---
    @@ -120,7 +132,14 @@ public static final BigDecimal parseField(byte[] bytes, int startPos, int length
     			throw new NumberFormatException("There is leading or trailing whitespace in the numeric field.");
     		}
     
    -		String str = new String(bytes, startPos, i);
    -		return new BigDecimal(str);
    +		final char[] reuse = new char[i];
    --- End diff --
    
    `reuse` is not reused here but created new in every method invocation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2304#discussion_r75459165
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/BigDecParser.java ---
    @@ -0,0 +1,126 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +
    +package org.apache.flink.types.parser;
    +
    +import java.math.BigDecimal;
    +import org.apache.flink.annotation.PublicEvolving;
    +
    +/**
    + * Parses a text field into a {@link java.math.BigDecimal}.
    + */
    +@PublicEvolving
    +public class BigDecParser extends FieldParser<BigDecimal> {
    +
    +	private static final BigDecimal BIG_DECIMAL_INSTANCE = BigDecimal.ZERO;
    +
    +	private BigDecimal result;
    +
    +	@Override
    +	public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, BigDecimal reusable) {
    +		int i = startPos;
    +
    +		final int delimLimit = limit - delimiter.length + 1;
    +
    +		while (i < limit) {
    +			if (i < delimLimit && delimiterNext(bytes, i, delimiter)) {
    +				if (i == startPos) {
    +					setErrorState(ParseErrorState.EMPTY_STRING);
    +					return -1;
    +				}
    +				break;
    +			}
    +			i++;
    +		}
    +
    +		if (i > startPos &&
    +				(Character.isWhitespace(bytes[startPos]) || Character.isWhitespace(bytes[(i - 1)]))) {
    +			setErrorState(ParseErrorState.NUMERIC_VALUE_ILLEGAL_CHARACTER);
    +			return -1;
    +		}
    +
    +		String str = new String(bytes, startPos, i - startPos);
    --- End diff --
    
    Instead of creating String instances, we could copy the relevant bytes into a reusable `char[]` and create the `BigDecimal` with the `BigDecimal(char[] in, int offset, int len)` constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2304


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2304#discussion_r79572080
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/BigDecParser.java ---
    @@ -55,9 +56,20 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, B
     			return -1;
     		}
     
    -		String str = new String(bytes, startPos, i - startPos);
     		try {
    -			this.result = new BigDecimal(str);
    +			final int length = i - startPos;
    +			if (reuse == null || reuse.length != length) {
    --- End diff --
    
    can we change this to `reuse.length < length` to only create a new array in case more space is needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2304: [FLINK-4268] [core] Add parsers for BigDecimal/Big...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2304#discussion_r75459189
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/BigDecParser.java ---
    @@ -0,0 +1,126 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +
    +package org.apache.flink.types.parser;
    +
    +import java.math.BigDecimal;
    +import org.apache.flink.annotation.PublicEvolving;
    +
    +/**
    + * Parses a text field into a {@link java.math.BigDecimal}.
    + */
    +@PublicEvolving
    +public class BigDecParser extends FieldParser<BigDecimal> {
    +
    +	private static final BigDecimal BIG_DECIMAL_INSTANCE = BigDecimal.ZERO;
    +
    +	private BigDecimal result;
    +
    +	@Override
    +	public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, BigDecimal reusable) {
    +		int i = startPos;
    +
    +		final int delimLimit = limit - delimiter.length + 1;
    +
    +		while (i < limit) {
    +			if (i < delimLimit && delimiterNext(bytes, i, delimiter)) {
    +				if (i == startPos) {
    +					setErrorState(ParseErrorState.EMPTY_STRING);
    +					return -1;
    +				}
    +				break;
    +			}
    +			i++;
    +		}
    +
    +		if (i > startPos &&
    +				(Character.isWhitespace(bytes[startPos]) || Character.isWhitespace(bytes[(i - 1)]))) {
    +			setErrorState(ParseErrorState.NUMERIC_VALUE_ILLEGAL_CHARACTER);
    +			return -1;
    +		}
    +
    +		String str = new String(bytes, startPos, i - startPos);
    +		try {
    +			this.result = new BigDecimal(str);
    +			return (i == limit) ? limit : i + delimiter.length;
    +		} catch (NumberFormatException e) {
    +			setErrorState(ParseErrorState.NUMERIC_VALUE_FORMAT_ERROR);
    +			return -1;
    +		}
    +	}
    +
    +	@Override
    +	public BigDecimal createValue() {
    +		return BIG_DECIMAL_INSTANCE;
    +	}
    +
    +	@Override
    +	public BigDecimal getLastResult() {
    +		return this.result;
    +	}
    +
    +	/**
    +	 * Static utility to parse a field of type BigDecimal from a byte sequence that represents text
    +	 * characters
    +	 * (such as when read from a file stream).
    +	 *
    +	 * @param bytes    The bytes containing the text data that should be parsed.
    +	 * @param startPos The offset to start the parsing.
    +	 * @param length   The length of the byte sequence (counting from the offset).
    +	 * @return The parsed value.
    +	 * @throws NumberFormatException Thrown when the value cannot be parsed because the text 
    +	 * represents not a correct number.
    +	 */
    +	public static final BigDecimal parseField(byte[] bytes, int startPos, int length) {
    +		return parseField(bytes, startPos, length, (char) 0xffff);
    +	}
    +
    +	/**
    +	 * Static utility to parse a field of type BigDecimal from a byte sequence that represents text
    +	 * characters
    +	 * (such as when read from a file stream).
    +	 *
    +	 * @param bytes     The bytes containing the text data that should be parsed.
    +	 * @param startPos  The offset to start the parsing.
    +	 * @param length    The length of the byte sequence (counting from the offset).
    +	 * @param delimiter The delimiter that terminates the field.
    +	 * @return The parsed value.
    +	 * @throws NumberFormatException Thrown when the value cannot be parsed because the text 
    +	 * represents not a correct number.
    +	 */
    +	public static final BigDecimal parseField(byte[] bytes, int startPos, int length, char delimiter) {
    +		if (length <= 0) {
    +			throw new NumberFormatException("Invalid input: Empty string");
    +		}
    +		int i = 0;
    +		final byte delByte = (byte) delimiter;
    +
    +		while (i < length && bytes[startPos + i] != delByte) {
    +			i++;
    +		}
    +
    +		if (i > 0 &&
    +				(Character.isWhitespace(bytes[startPos]) || Character.isWhitespace(bytes[startPos + i - 1]))) {
    +			throw new NumberFormatException("There is leading or trailing whitespace in the numeric field.");
    +		}
    +
    +		String str = new String(bytes, startPos, i);
    --- End diff --
    
    same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2304: [FLINK-4268] [core] Add parsers for BigDecimal/BigInteger

Posted by twalthr <gi...@git.apache.org>.
Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/2304
  
    Thanks again for the review @fhueske. I will fix your comments and merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---