You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by jhorcicka <gi...@git.apache.org> on 2016/05/20 11:33:50 UTC

[GitHub] metamodel pull request: FixedWidth EBCDIC support

GitHub user jhorcicka opened a pull request:

    https://github.com/apache/metamodel/pull/103

    FixedWidth EBCDIC support

    METAMODEL-250 FixedWidth EBCDIC support
    
    * Refactoring of FixedWidthReader. 
    * BufferedReader was replaced by BufferedInputStream (support for files without new-line characters). 
    * New boolean options "headerPresent" and "eolPresent". 
    * Simple fixed-width EBCDIC file in test/resources. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jhorcicka/metamodel master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #103
    
----
commit 480f71af754f08d53891f40d9ff9c27822fec62b
Author: jhorcicka <ja...@humaninference.com>
Date:   2016-05-20T11:26:23Z

    METAMODEL-250 FixedWidth EBCDIC support: headerPresent and eolPresent options; Reader was replaced by InputStream;

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/103
  
    Test failed on Travis, but it seems unrelated.
    (Although quite interesting test failure that I'm gonna report into JIRA to look at.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66738122
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    I guess what I want to avoid is just that the two properties are redundant or can be conflicting. For instance of `columnNameLineNumber` is `0` and `headerPresent` is `true` then they communicate the opposite fact to the user.
    
    So maybe it's rather just a matter of `headerPresent` needing a rename? Can we name it in a way where it's meaning is less about the header and more about the padded whitespace maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/103


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66746248
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    Yes, let's give it a better name. I am not too much into "padding" since some padding is actually in all rows. Basically this "binary header" contains some additional data about the file, so what about `metadataPresent`? Would that be clear to the users?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r64030560
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -54,45 +56,49 @@ public FixedWidthConfiguration(int fixedValueWidth) {
     	}
     
     	public FixedWidthConfiguration(int[] valueWidth) {
    -		this(DEFAULT_COLUMN_NAME_LINE, FileHelper.DEFAULT_ENCODING, valueWidth,
    -				false);
    +		this(DEFAULT_COLUMN_NAME_LINE, FileHelper.DEFAULT_ENCODING, valueWidth, false, false, true);
     	}
     
         public FixedWidthConfiguration(int columnNameLineNumber, String encoding, int fixedValueWidth) {
    -        this(columnNameLineNumber, encoding, fixedValueWidth, false);
    +        this(columnNameLineNumber, encoding, fixedValueWidth, false, false, true);
         }
     
         public FixedWidthConfiguration(int columnNameLineNumber, String encoding, int fixedValueWidth,
    -            boolean failOnInconsistentLineWidth) {
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
             this.encoding = encoding;
             this.fixedValueWidth = fixedValueWidth;
             this.columnNameLineNumber = columnNameLineNumber;
             this.failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +		this.headerPresent = headerPresent;
    +		this.eolPresent = eolPresent;
             this.columnNamingStrategy = null;
             this.valueWidths = new int[0];
         }
     
         public FixedWidthConfiguration(int columnNameLineNumber, String encoding,
    -            int[] valueWidths, boolean failOnInconsistentLineWidth) {
    -        this(columnNameLineNumber, null, encoding, valueWidths, failOnInconsistentLineWidth);
    +            int[] valueWidths, boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
    +        this(columnNameLineNumber, null, encoding, valueWidths, failOnInconsistentLineWidth, headerPresent,
    +				eolPresent);
         }
         
         public FixedWidthConfiguration(int columnNameLineNumber, ColumnNamingStrategy columnNamingStrategy, String encoding,
    -            int[] valueWidths, boolean failOnInconsistentLineWidth) {
    +            int[] valueWidths, boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
             this.encoding = encoding;
             this.fixedValueWidth = -1;
             this.columnNameLineNumber = columnNameLineNumber;
             this.failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +		this.headerPresent = headerPresent;
    +		this.eolPresent = eolPresent;
    --- End diff --
    
    And here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66380408
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    This new property is redundant (and thus potentially 'dirty') because `columnNameLineNumber` holds the same metadata. If it is 0 (`NO_COLUMN_NAME_LINE `), then it means that there is no header.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r67372450
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    Hmm I'm really at a loss because deep inside I feel that this is a very particular thing to EBCDIC and therefore I cannot come up with a fitting generic name. And so maybe we should just give it a name that reveals that it is a particular thing. Such as `skipEbcdicHeader` or whatever. Or maybe it is less particular than that, but then I am still not sure what it is! :)
    
    I would almost consider subtyping the whole configuration class to have a EbcdicConfiguration class instead of FixedWidthConfiguration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66473039
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    (might be whitespace in the length of a line though) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r64159744
  
    --- Diff: fixedwidth/src/test/java/org/apache/metamodel/fixedwidth/FixedWidthConfigurationTest.java ---
    @@ -27,18 +27,18 @@
     	public void testToString() throws Exception {
     		assertEquals(
     				"FixedWidthConfiguration[encoding=UTF8, fixedValueWidth=10, valueWidths=[], columnNameLineNumber=1, failOnInconsistentLineWidth=true]",
    -				new FixedWidthConfiguration(1, "UTF8", 10, true).toString());
    +				new FixedWidthConfiguration(1, "UTF8", 10, true, false, true).toString());
     	}
     
     	public void testEquals() throws Exception {
     		FixedWidthConfiguration conf1 = new FixedWidthConfiguration(1, "UTF8",
    -				10, true);
    +				10, true, false, true);
    --- End diff --
    
    Please make sure there are overloaded constructors available for the old constructor signature. Then you also wont have to change this existing test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on the issue:

    https://github.com/apache/metamodel/pull/103
  
    +1 from me, let's merge soon so that we get this into the next release - it's been dangling for too long :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r64159697
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthReader.java ---
    @@ -31,156 +31,261 @@
      * Reader capable of separating values based on a fixed width setting.
      */
     final class FixedWidthReader implements Closeable {
    +    private static final int END_OF_STREAM = -1;
    +    private static final int LINE_FEED = '\n';
    +    private static final int CARRIAGE_RETURN = '\r';
    +    private final BufferedInputStream _stream;
    +    private final String _charsetName;
    +    private final int _fixedValueWidth;
    +    private final int[] _valueWidths;
    +    private int _valueIndex = 0;
    +    private final boolean _headerPresent;
    +    private final boolean _eolPresent;
    +    private final boolean _failOnInconsistentLineWidth;
    +    private final int _expectedLineLength;
    +    private final boolean _constantWidth;
    +    private volatile int _rowNumber;
    +    private boolean _headerSkipped;
     
    -	private final BufferedReader _reader;
    -	private final int _fixedValueWidth;
    -	private final int[] _valueWidths;
    -	private final boolean _failOnInconsistentLineWidth;
    -	private final int expectedLineLength;
    -	private final boolean constantWidth;
    -	private volatile int _rowNumber;
    -
    -	public FixedWidthReader(Reader reader, int fixedValueWidth,
    -			boolean failOnInconsistentLineWidth) {
    -		this(new BufferedReader(reader), fixedValueWidth,
    -				failOnInconsistentLineWidth);
    -	}
    -
    -	public FixedWidthReader(BufferedReader reader, int fixedValueWidth,
    -			boolean failOnInconsistentLineWidth) {
    -		_reader = reader;
    -		_fixedValueWidth = fixedValueWidth;
    -		_failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    -		_rowNumber = 0;
    -		_valueWidths = null;
    -
    -		constantWidth = true;
    -		expectedLineLength = -1;
    -	}
    -
    -	public FixedWidthReader(Reader reader, int[] valueWidths,
    -			boolean failOnInconsistentLineWidth) {
    -		this(new BufferedReader(reader), valueWidths,
    -				failOnInconsistentLineWidth);
    -	}
    -
    -	public FixedWidthReader(BufferedReader reader, int[] valueWidths,
    -			boolean failOnInconsistentLineWidth) {
    -		_reader = reader;
    -		_fixedValueWidth = -1;
    -		_valueWidths = valueWidths;
    -		_failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    -		_rowNumber = 0;
    -
    -		constantWidth = false;
    -		int expectedLineLength = 0;
    -		if (_fixedValueWidth == -1) {
    -			for (int i = 0; i < _valueWidths.length; i++) {
    -				expectedLineLength += _valueWidths[i];
    -			}
    -		}
    -		this.expectedLineLength = expectedLineLength;
    -	}
    -
    -	/***
    -	 * Reads the next line in the file.
    -	 * 
    -	 * @return an array of values in the next line, or null if the end of the
    -	 *         file has been reached.
    -	 * 
    -	 * @throws IllegalStateException
    -	 *             if an exception occurs while reading the file.
    -	 */
    -	public String[] readLine() throws IllegalStateException {
    -
    -		try {
    -			final List<String> values = new ArrayList<String>();
    -			final String line = _reader.readLine();
    -			if (line == null) {
    -				return null;
    -			}
    -
    -			StringBuilder nextValue = new StringBuilder();
    -
    -			int valueIndex = 0;
    -
    -			final CharacterIterator it = new StringCharacterIterator(line);
    -			for (char c = it.first(); c != CharacterIterator.DONE; c = it
    -					.next()) {
    -				nextValue.append(c);
    -
    -				final int valueWidth;
    -				if (constantWidth) {
    -					valueWidth = _fixedValueWidth;
    -				} else {
    -					if (valueIndex >= _valueWidths.length) {
    -						if (_failOnInconsistentLineWidth) {
    -							String[] result = values.toArray(new String[values
    -									.size()]);
    -							throw new InconsistentValueWidthException(result,
    -									line, _rowNumber + 1);
    -						} else {
    -							// silently ignore the inconsistency
    -							break;
    -						}
    -					}
    -					valueWidth = _valueWidths[valueIndex];
    -				}
    -
    -				if (nextValue.length() == valueWidth) {
    -					// write the value
    -					values.add(nextValue.toString().trim());
    -					nextValue = new StringBuilder();
    -					valueIndex++;
    -				}
    -			}
    -
    -			if (nextValue.length() > 0) {
    -				values.add(nextValue.toString().trim());
    -			}
    -
    -			String[] result = values.toArray(new String[values.size()]);
    -
    -			if (!_failOnInconsistentLineWidth && !constantWidth) {
    -				if (result.length != _valueWidths.length) {
    -					String[] correctedResult = new String[_valueWidths.length];
    -					for (int i = 0; i < result.length
    -							&& i < _valueWidths.length; i++) {
    -						correctedResult[i] = result[i];
    -					}
    -					result = correctedResult;
    -				}
    -			}
    -
    -			if (_failOnInconsistentLineWidth) {
    -				_rowNumber++;
    -				if (constantWidth) {
    -					if (line.length() % _fixedValueWidth != 0) {
    -						throw new InconsistentValueWidthException(result, line,
    -								_rowNumber);
    -					}
    -				} else {
    -					if (result.length != values.size()) {
    -						throw new InconsistentValueWidthException(result, line,
    -								_rowNumber);
    -					}
    -
    -					if (line.length() != expectedLineLength) {
    -						throw new InconsistentValueWidthException(result, line,
    -								_rowNumber);
    -					}
    -				}
    -			}
    -
    -			return result;
    -		} catch (IOException e) {
    -			throw new IllegalStateException(e);
    -		}
    -	}
    -
    -	@Override
    -	public void close() throws IOException {
    -		_reader.close();
    -	}
    +    public FixedWidthReader(InputStream stream, String charsetName, int fixedValueWidth,
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
    +        this(new BufferedInputStream(stream), charsetName, fixedValueWidth, failOnInconsistentLineWidth, headerPresent,
    +                eolPresent);
    +    }
     
    +    public FixedWidthReader(BufferedInputStream stream, String charsetName, int fixedValueWidth,
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
    +        _stream = stream;
    +        _charsetName = charsetName;
    +        _fixedValueWidth = fixedValueWidth;
    +        _failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +        _headerPresent = headerPresent;
    +        _eolPresent = eolPresent;
    +        _rowNumber = 0;
    +        _valueWidths = null;
    +        _constantWidth = true;
    +        _expectedLineLength = -1;
    +    }
    +
    +    public FixedWidthReader(InputStream stream, String charsetName, int[] valueWidths,
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
    +        this(new BufferedInputStream(stream), charsetName, valueWidths, failOnInconsistentLineWidth, headerPresent,
    +                eolPresent);
    +    }
    +
    +    public FixedWidthReader(BufferedInputStream stream, String charsetName, int[] valueWidths,
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
    +        _stream = stream;
    +        _charsetName = charsetName;
    +        _fixedValueWidth = -1;
    +        _valueWidths = valueWidths;
    +        _failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +        _headerPresent = headerPresent;
    +        _eolPresent = eolPresent;
    +        _rowNumber = 0;
    +        _constantWidth = false;
    +        int expectedLineLength = 0;
    +
    +        for (int i = 0; i < _valueWidths.length; i++) {
    +            expectedLineLength += _valueWidths[i];
    +        }
    +
    +        _expectedLineLength = expectedLineLength;
    +    }
    +
    +    /**
    +     * This reads and returns the next record from the file. Usually, it is a line but in case the new line characters
    +     * are not present, the length of the content depends on the column-widths setting.
    +     *
    +     * @return an array of values in the next line, or null if the end of the file has been reached.
    +     * @throws IllegalStateException if an exception occurs while reading the file.
    +     */
    +    public String[] readLine() throws IllegalStateException {
    +        try {
    +            if (shouldSkipHeader()) {
    +                skipHeader();
    +            }
    +
    +            return getValues();
    +        } catch (IOException e) {
    +            throw new IllegalStateException(e);
    +        }
    +    }
    +
    +    private boolean shouldSkipHeader() {
    +        return (_headerPresent && !_headerSkipped);
    +    }
    +
    +    private String[] getValues() throws IOException {
    +        final List<String> values = new ArrayList<>();
    +        final String singleRecordData = readSingleRecordData();
    +
    +        if (singleRecordData == null) {
    +            return null;
    +        }
    +
    +        processSingleRecordData(singleRecordData, values);
    +        String[] result = values.toArray(new String[values.size()]);
    +
    +        if (!_failOnInconsistentLineWidth && !_constantWidth) {
    +            result = correctResult(result);
    +        }
    +
    +        if (_failOnInconsistentLineWidth) {
    +            throwInconsistentValueException(singleRecordData, result, values.size());
    +        }
    +
    +        return result;
    +    }
    +
    +    private void throwInconsistentValueException(String singleRecordData, String[] result, int valuesSize) {
    +        InconsistentValueWidthException inconsistentValueException =
    +                buildInconsistentValueWidthException(singleRecordData, result, valuesSize);
    +
    +        if (inconsistentValueException != null) {
    +            throw inconsistentValueException;
    +        }
    +    }
    +
    +    private void processSingleRecordData(final String singleRecordData, final List<String> values) {
    +        StringBuilder nextValue = new StringBuilder();
    +        final CharacterIterator it = new StringCharacterIterator(singleRecordData);
    +        _valueIndex = 0;
    +
    +        for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    +            processCharacter(c, nextValue, values, singleRecordData);
    +        }
    +
    +        if (nextValue.length() > 0) {
    +            addNewValueIfAppropriate(values, nextValue);
    +        }
    +    }
    +
    +    private String readSingleRecordData() throws IOException {
    +        if (isEOLAvailable()) {
    +            StringBuilder line = new StringBuilder();
    +            int ch;
    +
    +            for (ch = _stream.read(); !isEndingCharacter(ch); ch = _stream.read()) {
    +                line.append((char) ch);
    +            }
    +
    +            if (ch == CARRIAGE_RETURN) {
    +                readLineFeedIfFollows();
    +            }
    +
    +            return (line.length()) > 0 ? line.toString() : null;
    +        } else {
    +            byte[] buffer = new byte[_expectedLineLength];
    +            int bytesRead = _stream.read(buffer, 0, _expectedLineLength);
    +
    +            if (bytesRead < 0) {
    +                return null;
    +            }
    +
    +            return new String(buffer, _charsetName);
    +        }
    +    }
    +
    +    private void readLineFeedIfFollows() throws IOException {
    +        _stream.mark(1);
    +
    +        if (_stream.read() != LINE_FEED) {
    +            _stream.reset();
    +        }
    +    }
    +
    +    private boolean isEndingCharacter(int ch) {
    +        return (ch == CARRIAGE_RETURN || ch == LINE_FEED || ch == END_OF_STREAM);
    +    }
    +
    +    private boolean isEOLAvailable() {
    +        return _eolPresent;
    +    }
    +
    +    private void processCharacter(char c, StringBuilder nextValue, List<String> values, String recordData) {
    +        nextValue.append(c);
    +        final int valueWidth = getValueWidth(values, recordData);
    +
    +        if (nextValue.length() == valueWidth) {
    +            addNewValueIfAppropriate(values, nextValue);
    +            nextValue.setLength(0); // clear the buffer
    +
    +            if (_valueWidths != null) {
    +                _valueIndex = (_valueIndex + 1) % _valueWidths.length;
    +            }
    +        }
    +    }
    +
    +    private int getValueWidth(List<String> values, String recordData) {
    +        if (_constantWidth) {
    +            return _fixedValueWidth;
    +        } else {
    +            if (_valueIndex >= _valueWidths.length) {
    +                if (_failOnInconsistentLineWidth) {
    +                    String[] result = values.toArray(new String[values.size()]);
    +                    throw new InconsistentValueWidthException(result, recordData, _rowNumber + 1);
    +                } else {
    +                    return -1; // silently ignore the inconsistency
    +                }
    +            }
    +
    +            return _valueWidths[_valueIndex];
    +        }
    +    }
    +
    +    private void addNewValueIfAppropriate(List<String> values, StringBuilder nextValue) {
    +        if (_valueWidths != null) {
    +            if (values.size() < _valueWidths.length) {
    +                values.add(nextValue.toString().trim());
    +            }
    +        } else {
    +            values.add(nextValue.toString().trim());
    +        }
    +    }
    +
    +    private String[] correctResult(String[] result) {
    +        if (result.length != _valueWidths.length) {
    +            String[] correctedResult = new String[_valueWidths.length];
    +
    +            for (int i = 0; i < result.length && i < _valueWidths.length; i++) {
    +                correctedResult[i] = result[i];
    +            }
    +
    +            result = correctedResult;
    +        }
    +
    +        return result;
    +    }
    +
    +    private InconsistentValueWidthException buildInconsistentValueWidthException(String recordData, String[] result,
    --- End diff --
    
    This method seems backwards to me. It is called "build[something]" but will usually (in the good cases) return null. And then it is evaluated later on to be thrown if it is not null. Why this pattern is needed I don't know. Seems a much more normal way to do it was to have something like a "validate[something]" which will throw the exception if it is desirable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r64030568
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -103,6 +109,8 @@ public FixedWidthConfiguration(String encoding, List<FixedWidthColumnSpec> colum
                 valueWidths[i] = columnSpecs.get(i).getWidth();
             }
             this.failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +		this.headerPresent = headerPresent;
    +		this.eolPresent = eolPresent;
    --- End diff --
    
    And here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66472895
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    I'm not sure, but I think the "lines"  are not real lines, but the EBCDIC header followed by lots of whitespace, and that it discards until it sees the real data 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r64030531
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -54,45 +56,49 @@ public FixedWidthConfiguration(int fixedValueWidth) {
     	}
     
     	public FixedWidthConfiguration(int[] valueWidth) {
    -		this(DEFAULT_COLUMN_NAME_LINE, FileHelper.DEFAULT_ENCODING, valueWidth,
    -				false);
    +		this(DEFAULT_COLUMN_NAME_LINE, FileHelper.DEFAULT_ENCODING, valueWidth, false, false, true);
     	}
     
         public FixedWidthConfiguration(int columnNameLineNumber, String encoding, int fixedValueWidth) {
    -        this(columnNameLineNumber, encoding, fixedValueWidth, false);
    +        this(columnNameLineNumber, encoding, fixedValueWidth, false, false, true);
         }
     
         public FixedWidthConfiguration(int columnNameLineNumber, String encoding, int fixedValueWidth,
    -            boolean failOnInconsistentLineWidth) {
    +            boolean failOnInconsistentLineWidth, boolean headerPresent, boolean eolPresent) {
             this.encoding = encoding;
             this.fixedValueWidth = fixedValueWidth;
             this.columnNameLineNumber = columnNameLineNumber;
             this.failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +		this.headerPresent = headerPresent;
    +		this.eolPresent = eolPresent;
    --- End diff --
    
    Something weird happening with the whitespace here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel issue #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on the issue:

    https://github.com/apache/metamodel/pull/103
  
    I separated EbcdicConfiguration (extends FixedWidthConfiguration) and EbcdicReader (extends FixedWidthReader). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request: FixedWidth EBCDIC support

Posted by LosD <gi...@git.apache.org>.
Github user LosD commented on the pull request:

    https://github.com/apache/metamodel/pull/103#issuecomment-220587187
  
    Travis failure seems unrelated (a failed Cassandra startup)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r67560331
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    I would actually then prefer that we subclass it. Remove the 'final' part of the FixedWidthConfiguration and make an EbcdicConfiguration class that extends it. That way we don't introduce something into this class that is still not a very well-defined thing. For that you have my green :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r73650071
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthReader.java ---
    @@ -18,78 +18,235 @@
      */
     package org.apache.metamodel.fixedwidth;
     
    -import java.io.BufferedReader;
    +import java.io.BufferedInputStream;
     import java.io.Closeable;
     import java.io.IOException;
    -import java.io.Reader;
    +import java.io.InputStream;
    +import java.text.CharacterIterator;
    +import java.text.StringCharacterIterator;
    +import java.util.ArrayList;
    +import java.util.List;
     
     /**
      * Reader capable of separating values based on a fixed width setting.
      */
    -final public class FixedWidthReader implements Closeable {
    -
    -	private final BufferedReader _reader;
    -	private final FixedWidthLineParser _parser; 
    -
    -	public FixedWidthReader(Reader reader, int fixedValueWidth,
    -			boolean failOnInconsistentLineWidth) {
    -		this(new BufferedReader(reader), fixedValueWidth,
    -				failOnInconsistentLineWidth);
    -	}
    -
    -	public FixedWidthReader(BufferedReader reader, int fixedValueWidth,
    -			boolean failOnInconsistentLineWidth) {
    -		_reader = reader;
    -        final FixedWidthConfiguration fixedWidthConfiguration = new FixedWidthConfiguration(
    -                FixedWidthConfiguration.NO_COLUMN_NAME_LINE, null, fixedValueWidth, failOnInconsistentLineWidth);
    -        _parser = new FixedWidthLineParser(fixedWidthConfiguration, -1, 0);
    -	}
    -
    -	public FixedWidthReader(Reader reader, int[] valueWidths,
    -			boolean failOnInconsistentLineWidth) {
    -		this(new BufferedReader(reader), valueWidths,
    -				failOnInconsistentLineWidth);
    -	}
    -
    -	public FixedWidthReader(BufferedReader reader, int[] valueWidths,
    -			boolean failOnInconsistentLineWidth) {
    -		_reader = reader;
    -		int fixedValueWidth = -1;
    -		int expectedLineLength = 0;
    -		if (fixedValueWidth == -1) {
    -			for (int i = 0; i < valueWidths.length; i++) {
    -				expectedLineLength += valueWidths[i];
    -			}
    -		}
    -        final FixedWidthConfiguration fixedWidthConfiguration = new FixedWidthConfiguration(
    -                FixedWidthConfiguration.NO_COLUMN_NAME_LINE, null, valueWidths, failOnInconsistentLineWidth);
    -        _parser = new FixedWidthLineParser(fixedWidthConfiguration, expectedLineLength, 0);
    -	}
    -
    -	
    -	/***
    -	 * Reads the next line in the file.
    -	 * 
    -	 * @return an array of values in the next line, or null if the end of the
    -	 *         file has been reached.
    -	 * 
    -	 * @throws IllegalStateException
    -	 *             if an exception occurs while reading the file.
    -	 */
    -	public String[] readLine() throws IllegalStateException {
    -        String line;
    +class FixedWidthReader implements Closeable {
    +    private static final int END_OF_STREAM = -1;
    +    private static final int LINE_FEED = '\n';
    +    private static final int CARRIAGE_RETURN = '\r';
    +    
    +    protected final String _charsetName;
    +    private final int _fixedValueWidth;
    +    private final int[] _valueWidths;
    +    private int _valueIndex = 0;
    +    private final boolean _failOnInconsistentLineWidth;
    +    private final boolean _constantWidth;
    +    private volatile int _rowNumber;
    +    protected final BufferedInputStream _stream;
    +    protected final int _expectedLineLength;
    +
    +    public FixedWidthReader(InputStream stream, String charsetName, int fixedValueWidth,
    +            boolean failOnInconsistentLineWidth) {
    +        this(new BufferedInputStream(stream), charsetName, fixedValueWidth, failOnInconsistentLineWidth);
    +    }
    +
    +    private FixedWidthReader(BufferedInputStream stream, String charsetName, int fixedValueWidth,
    +            boolean failOnInconsistentLineWidth) {
    +        _stream = stream;
    +        _charsetName = charsetName;
    +        _fixedValueWidth = fixedValueWidth;
    +        _failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +        _rowNumber = 0;
    +        _valueWidths = null;
    +        _constantWidth = true;
    +        _expectedLineLength = -1;
    +    }
    +
    +    public FixedWidthReader(InputStream stream, String charsetName, int[] valueWidths,
    +            boolean failOnInconsistentLineWidth) {
    +        this(new BufferedInputStream(stream), charsetName, valueWidths, failOnInconsistentLineWidth);
    +    }
    +
    +    FixedWidthReader(BufferedInputStream stream, String charsetName, int[] valueWidths,
    +            boolean failOnInconsistentLineWidth) {
    +        _stream = stream;
    +        _charsetName = charsetName;
    +        _fixedValueWidth = -1;
    +        _valueWidths = valueWidths;
    +        _failOnInconsistentLineWidth = failOnInconsistentLineWidth;
    +        _rowNumber = 0;
    +        _constantWidth = false;
    +        int expectedLineLength = 0;
    +
    +        for (final int _valueWidth : _valueWidths) {
    +            expectedLineLength += _valueWidth;
    +        }
    +
    +        _expectedLineLength = expectedLineLength;
    +    }
    +
    +    /**
    +     * This reads and returns the next record from the file. Usually, it is a line but in case the new line characters
    +     * are not present, the length of the content depends on the column-widths setting.
    +     *
    +     * @return an array of values in the next line, or null if the end of the file has been reached.
    +     * @throws IllegalStateException if an exception occurs while reading the file.
    +     */
    +    public String[] readLine() throws IllegalStateException {
             try {
    -            line = _reader.readLine();
    -            return _parser.parseLine(line);
    +            beforeReadLine();
    +            _rowNumber++;
    +            return getValues();
             } catch (IOException e) {
                 throw new IllegalStateException(e);
             }
    -	}
    -	
    +    }
    +
    +    /**
    +     * Empty hook that enables special behavior in sub-classed readers (by overriding this method). 
    +     */
    +    protected void beforeReadLine() {
    +        return;
    +    }
    +
    +    private String[] getValues() throws IOException {
    +        final List<String> values = new ArrayList<>();
    +        final String singleRecordData = readSingleRecordData();
    +
    +        if (singleRecordData == null) {
    +            return null;
    +        }
    +
    +        processSingleRecordData(singleRecordData, values);
    +        String[] result = values.toArray(new String[values.size()]);
    +
    +        if (!_failOnInconsistentLineWidth && !_constantWidth) {
    +            result = correctResult(result);
    +        }
    +
    +        validateConsistentValue(singleRecordData, result, values.size());
    +
    +        return result;
    +    }
    +
    +    private void validateConsistentValue(String recordData, String[] result, int valuesSize) {
    +        if (!_failOnInconsistentLineWidth) {
    +            return;
    +        }
    +
    +        InconsistentValueWidthException inconsistentValueException = null;
    +
    +        if (_constantWidth) {
    +            if (recordData.length() % _fixedValueWidth != 0) {
    +                inconsistentValueException = new InconsistentValueWidthException(result, recordData, _rowNumber);
    +            }
    +        } else if (result.length != valuesSize || recordData.length() != _expectedLineLength) {
    +            inconsistentValueException = new InconsistentValueWidthException(result, recordData, _rowNumber);
    +        }
    +
    +        if (inconsistentValueException != null) {
    +            throw inconsistentValueException;
    +        }
    +    }
    +
    +    private void processSingleRecordData(final String singleRecordData, final List<String> values) {
    +        StringBuilder nextValue = new StringBuilder();
    +        final CharacterIterator it = new StringCharacterIterator(singleRecordData);
    +        _valueIndex = 0;
    +
    +        for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    +            processCharacter(c, nextValue, values, singleRecordData);
    +        }
    +
    +        if (nextValue.length() > 0) {
    +            addNewValueIfAppropriate(values, nextValue);
    +        }
    +    }
    +
    +    String readSingleRecordData() throws IOException {
    +        StringBuilder line = new StringBuilder();
    +        int ch;
    +
    +        for (ch = _stream.read(); !isEndingCharacter(ch); ch = _stream.read()) {
    +            line.append((char) ch);
    +        }
    +
    +        if (ch == CARRIAGE_RETURN) {
    +            readLineFeedIfFollows();
    +        }
    +
    +        return (line.length()) > 0 ? line.toString() : null;
    --- End diff --
    
    Here, the returned String is charset-less which is actually a bug for diacritics input. PR for this bug fix is here: https://github.com/apache/metamodel/pull/121


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by kaspersorensen <gi...@git.apache.org>.
Github user kaspersorensen commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66470815
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    Ah right, but maybe in that case you would then simply set columnNameLineNumber to 2 because its on the second line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66390266
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    I am not so sure about that. I can image it is confusing (let's create a better name then) but this is something else. In EBCDIC files there might be a "binary header" with some meta-data that we want to skip before we can read actual data. And the actual data can (perhaps) contain the "normal" header with column names. I am not sure if this can really happen since there are "format" files for structure and column names definition.  
    
    Example:
    1. HDR 2016-06-09 ... // this is "headerPresent" 
    [2. NAME, SURNAME, YEAR, ...]? // this is "columnNameLineNumber"
    3. John, Doe, 1980, ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r66568320
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    @LosD: I do it generally by skipping whitespace until I reach data but in the case of "the one" file we had it was just the length of one record. 
    
    @kaspersorensen: Yes, I could set columnNameLineNumber to 2 but I am not sure if "a common user" would know to do this too. Unless (s)he knows the details of the file which I assume we can not expect. On the other hand if there is "column-names-header" then (s)he has to be aware of it anyway to use it properly. I don't have a strong opinion here and will go with "whatever" you advice. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] metamodel pull request #103: FixedWidth EBCDIC support

Posted by jhorcicka <gi...@git.apache.org>.
Github user jhorcicka commented on a diff in the pull request:

    https://github.com/apache/metamodel/pull/103#discussion_r67468607
  
    --- Diff: fixedwidth/src/main/java/org/apache/metamodel/fixedwidth/FixedWidthConfiguration.java ---
    @@ -33,66 +33,74 @@
     /**
      * Configuration of metadata about a fixed width values datacontext.
      */
    -public final class FixedWidthConfiguration extends BaseObject implements
    -		Serializable {
    +public final class FixedWidthConfiguration extends BaseObject implements Serializable {
     
    -	private static final long serialVersionUID = 1L;
    +    private static final long serialVersionUID = 1L;
     
    -	public static final int NO_COLUMN_NAME_LINE = 0;
    -	public static final int DEFAULT_COLUMN_NAME_LINE = 1;
    +    public static final int NO_COLUMN_NAME_LINE = 0;
    +    public static final int DEFAULT_COLUMN_NAME_LINE = 1;
     
    -	private final String encoding;
    -	private final int fixedValueWidth;
    -	private final int[] valueWidths;
    -	private final int columnNameLineNumber;
    -	private final boolean failOnInconsistentLineWidth;
    -	private final ColumnNamingStrategy columnNamingStrategy;
    +    private final String encoding;
    +    private final int fixedValueWidth;
    +    private final int[] valueWidths;
    +    private final int columnNameLineNumber;
    +    private final boolean failOnInconsistentLineWidth;
    +    private final boolean headerPresent;
    --- End diff --
    
    I would go with ```skipEbcdicHeader``` then. I also feel that EBCDIC is something special in the context of fixed-width but let's solve this within a new issue as a refactoring part. There already are other projects waiting for this to be merged and this kind of communication takes a lot of time. 
    
    So, let's change the name to ```skipEbcdicHeader```, focus on other aspects of this PR and then merge it. Then we can start a new discussion about separation of EbcdicConfiguration. What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---