You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "melin (Jira)" <ji...@apache.org> on 2022/02/14 15:48:00 UTC

[jira] [Created] (SPARK-38209) Selectively include EXTERNAL TABLE source files via REGEX

melin created SPARK-38209:
-----------------------------

             Summary: Selectively include EXTERNAL TABLE source files via REGEX
                 Key: SPARK-38209
                 URL: https://issues.apache.org/jira/browse/SPARK-38209
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: melin


https://issues.apache.org/jira/browse/HIVE-951

CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular expression.
CREATE EXTERNAL TABLE was designed to allow users to access data that exists outside of Hive, and
currently makes the assumption that all of the files located under the supplied path should be included
in the new table. Users frequently encounter directories containing multiple
datasets, or directories that contain data in heterogeneous schemas, and it's often
impractical or impossible to adjust the layout of the directory to meet the requirements of
CREATE EXTERNAL TABLE. A good example of this problem is creating an external table based
on the contents of an S3 bucket.

One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
as follows:

CREATE EXTERNAL TABLE
...
LOCATION path [file_regex]
...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org