You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/23 15:11:20 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

RussellSpitzer opened a new issue #2360:
URL: https://github.com/apache/iceberg/issues/2360


   While working on the addFiles procedure I found that it is currently impossible to use Spark's Avro or ORC writers in our test code because of classpath issues.
   
   See
   
   https://github.com/apache/iceberg/blob/f0a6b717dbf662caa9c762e72c47715a12625647/spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java#L90-L128
   
   We Should probably fix this so that the Spark sources's for these modules work. Alternative we can change the code in these tests to manually generate ORC and Avro files but that seems like a workaround for a classpath issue we should probably clean up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-805043527


   Yes, we faced the same issue that Hive exec uses a different `orc-core` version than `iceberg-orc`, therefore we were not able to pull both of them in and run unit tests that can write to Hive ORC tables and Iceberg ORC tables at the same time. In a real cluster, this problem goes away, because we shade the `orc-core` dep used by iceberg-orc and include that into the runtime jar. As @pvary mentioned our initial idea to resolve this was to create a new `bundled-orc` (like guava) module in Iceberg to avoid any classpath conflicts with Hive. A similar approach could potentially resolve the problems for Spark too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-962856290


   Suppose we do not intend to access iceberg internal API in test suites of `iceberg-spark`, `iceberg-spark-extension` modules (correct me if I'm wrong), move all test classes to `iceberg-spark-runtime` module should solve this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-962856290


   Suppose we do not intend to access iceberg internal API in test classes of `iceberg-spark`, `iceberg-spark-extension` modules (correct me if I'm wrong), move all test classes to `iceberg-spark-runtime` module should solve this problem.
   
   This approach also solves #2382


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-805030488


   We have faced the same issue when moved the mr module to the Hive repo. CC: @marton-bod
   First random idea: create a bundled-orc / bundled-avro / bundled-parquet(?) modules like we did with guava?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-805043527


   Yes, we faced the same issue that Hive exec uses a different `orc-core` version than `iceberg-orc`, therefore we were not able to pull both of them in and run unit tests that can write to Hive ORC tables and Iceberg ORC tables at the same time. In a real cluster, this problem goes away, because we shade the `orc-core` dep used by iceberg-orc and include that into the runtime jar. As @pvary mentioned our initial idea to resolve this was to create a new `bundled-orc` module in Iceberg to avoid any classpath conflicts with Hive. A similar approach could resolve the problems for Spark too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 commented on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
pan3793 commented on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-962856290


   Suppose we do not intend to access iceberg internal API in `iceberg-spark`, `iceberg-spark-extension` modules (correct me if I'm wrong), move all test classes to `iceberg-spark-runtime` module should solve this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pan3793 edited a comment on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-962856290


   Suppose we do not intend to access iceberg internal API in test classes of `iceberg-spark`, `iceberg-spark-extension` modules (correct me if I'm wrong), move all test classes to `iceberg-spark-runtime` module should solve this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on issue #2360: Internal Tests Cannot us ORC or Avro Datasource Writers

Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2360:
URL: https://github.com/apache/iceberg/issues/2360#issuecomment-805043527


   Yes, we faced the same issue that Hive exec uses a different `orc-core` version than `iceberg-core`, therefore we were not able to pull both of them in and run unit tests that can write to Hive ORC tables and Iceberg ORC tables at the same time. In a real cluster, this problem goes away, because we shade the `orc-core` dep used by iceberg-orc and include that into the runtime jar. As @pvary mentioned our initial idea to resolve this was to create a new `bundled-orc` module in Iceberg to avoid any classpath conflicts with Hive. A similar approach could resolve the problems for Spark too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org