You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2023/02/08 17:25:37 UTC

[impala] branch master updated (1d05381b7 -> eca34cc98)

This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from 1d05381b7 IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
     new 7dcf80b32 IMPALA-10804: [DOCS] Document spill to remote storage
     new b858f2acd IMPALA-11883: Calculate erasure-coded bytes read directly
     new 8935c7590 IMPALA-11859: Add bytes-read-encrypted metric
     new eca34cc98 IMPALA-11892: Restore pkg_resources with Python 2

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/hdfs-scan-node-base.cc                 |    7 +
 be/src/exec/hdfs-scan-node-base.h                  |    7 +-
 be/src/exec/orc/hdfs-orc-scanner.cc                |    5 +-
 be/src/runtime/io/hdfs-file-reader.cc              |   10 +-
 be/src/runtime/io/request-context.h                |    6 +-
 be/src/runtime/io/request-ranges.h                 |    7 +-
 be/src/runtime/io/scan-range.cc                    |    1 +
 be/src/scheduling/scheduler.cc                     |    2 +
 be/src/util/impalad-metrics.cc                     |    5 +
 be/src/util/impalad-metrics.h                      |    4 +
 common/fbs/CatalogObjects.fbs                      |    3 +
 common/protobuf/planner.proto                      |    3 +
 common/thrift/PlanNodes.thrift                     |    7 +-
 common/thrift/metrics.json                         |   10 +
 docs/topics/impala_disk_space.xml                  |  106 +
 .../org/apache/impala/catalog/FeIcebergTable.java  |    5 +-
 .../apache/impala/catalog/FileMetadataLoader.java  |    4 +-
 .../org/apache/impala/catalog/HdfsPartition.java   |   15 +-
 .../java/org/apache/impala/compat/HdfsShim.java    |   30 -
 .../org/apache/impala/planner/HdfsScanNode.java    |    1 +
 shell/impala-shell                                 |    6 +-
 shell/make_shell_tarball.sh                        |    6 +-
 shell/pkg_resources.py                             | 2700 ++++++++++++++++++++
 tests/query_test/test_io_metrics.py                |    6 +-
 24 files changed, 2900 insertions(+), 56 deletions(-)
 delete mode 100644 fe/src/main/java/org/apache/impala/compat/HdfsShim.java
 create mode 100644 shell/pkg_resources.py


[impala] 01/04: IMPALA-10804: [DOCS] Document spill to remote storage

Posted by mi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7dcf80b32e207c1078bed7aca1714ce59d2afe13
Author: Shajini Thayasingh <st...@cloudera.com>
AuthorDate: Fri Feb 3 10:40:43 2023 -0800

    IMPALA-10804: [DOCS] Document spill to remote storage
    
    Spill to HDFS, S3, and Ozone.
    
    Change-Id: I3efb2ffcc06cdbe69845c6dc4cf03d9f2e3dcabc
    Reviewed-on: http://gerrit.cloudera.org:8080/19472
    Reviewed-by: Yida Wu <wy...@gmail.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/topics/impala_disk_space.xml | 106 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 106 insertions(+)

diff --git a/docs/topics/impala_disk_space.xml b/docs/topics/impala_disk_space.xml
index b32502ff1..4440f7763 100644
--- a/docs/topics/impala_disk_space.xml
+++ b/docs/topics/impala_disk_space.xml
@@ -343,6 +343,112 @@ under the License.
       <p> Compression levels from 1 up to 22 (default 3) are supported for <codeph>ZSTD</codeph>.
         The lower the compression level, the faster the speed at the cost of compression ratio.</p>
     </section>
+    <section>
+      <title>Configure Impala Daemon to spill to S3</title>
+      <p>Impala occasionally needs to use persistent storage for writing intermediate files during
+        large sorts, joins, aggregations, or analytic function operations. If your workload results
+        in large volumes of intermediate data being written, it is recommended to configure the
+        heavy spilling queries to use a remote storage location rather than the local one. The
+        advantage of using remote storage for scratch space is that it is elastic and can handle any
+        amount of spilling.</p>
+      <p><b>Before you begin</b></p>
+      <p>Identify the URL for an S3 bucket to which you want your new Impala to write the temporary
+        data. If you use the S3 bucket that is associated with the environment, navigate to the S3
+        bucket and copy the URL. If you want to use an external S3 bucket, you must first configure
+        your environment to use the external S3 bucket with the correct read/write permissions.</p>
+      <p><b>Configuring the Start-up Option in Impala daemon</b></p>
+      <p>You can use the Impalad start option scratch_dirs to specify the locations of the
+        intermediate files. The format of the option is <codeph>scratch_dirs= remote_dir, local_buffer_dir(,
+          local_dir…).</codeph></p>
+      <p>With the option specified above:</p>
+      <ul>
+        <li>You can specify only one remote directory. When you configure a remote directory, you
+          must specify a local buffer directory as the buffer. However you can use multiple local
+          directories with the remote directory. If you specify multiple local directories, the
+          first local directory would be used as the local buffer directory.</li>
+        <li>If you configure both remote and local directories, the remote directory is only used
+          when the local directories are fully utilized.</li>
+        <li>The size of a remote intermediate file could affect the query performance, and the value
+          can be set by <codeph>>remote_tmp_file_size</codeph> in the start-up option. The default
+          size of a remote intermediate file is 16MB while the maximum is 256MB.</li>
+      </ul>
+      <p><b>Examples</b></p>
+      <ul>
+        <li>A remote scratch dir with one local buffer dir, file size 64MB.
+          <codeblock>‑‑scratch_dirs="s3a://remote_dir, /local_buffer_dir" ‑‑remote_tmp_file_size=64M</codeblock></li>
+        <li>A remote scratch dir with one local buffer dir, and one local dir.
+          <codeblock>‑‑scratch_dirs="s3a://remote_dir, /local_buffer_dir, /local_dir"</codeblock></li>
+        <li>A remote scratch dir with one local buffer dir, and multiple local dirs.
+          <codeblock>‑‑scratch_dirs="s3a://remote_dir, /local_buffer_dir, /local_dir_1, /local_dir_2"</codeblock></li>
+      </ul>
+    </section>
+    <section>
+      <title>Configure Impala Daemon to spill to HDFS</title>
+      <p>Impala occasionally needs to use persistent storage for writing intermediate files during
+        large sorts, joins, aggregations, or analytic function operations. If your workload results
+        in large volumes of intermediate data being written, it is recommended to configure the
+        heavy spilling queries to use a remote storage location rather than the local one. The
+        advantage of using remote storage for scratch space is that it is elastic and can handle any
+        amount of spilling.</p>
+      <p><b>Before you begin</b></p>
+      <ul>
+        <li>Identify the HDFS scratch directory where you want your new Impala to write the
+          temporary data.</li>
+        <li>Identify the port number of the HDFS scratch directory.</li>
+        <li>Configure Impala to write temporary data to disk during query processing.</li>
+      </ul>
+      <p><b>Configuring the Start-up Option in Impala daemon</b></p>
+      <p>You can use the Impalad start option “scratch_dirs” to specify the locations of the
+        intermediate files.</p>
+      <p>Use the following format for this start up option:</p>
+      <codeblock>‑‑scratch_dirs=”hdfs://ip_address:port_num(:max_bytes)(:priority), /local_buffer_dir” ‑‑remote_tmp_file_size=xM</codeblock>
+      <ul>
+        <li>Where <codeph>“hdfs://ip_address:port_num/path(:max_bytes)(:priority)”</codeph> is the remote
+          directory.</li>
+        <li><codeph>port_num</codeph> is required for the HDFS scratch directory.</li>
+        <li><codeph>max_bytes</codeph> and <codeph>priority</codeph> are optional.</li>
+      </ul>
+      <p>Using the above format:</p>
+      <ul>
+        <li>You can specify only one remote directory.</li>
+        <li>When you configure a remote directory, you must specify a local buffer directory as the
+          buffer. However you can use multiple local directories with the remote directory. If you
+          specify multiple local directories, the first local directory would be used as the local
+          buffer directory.</li>
+        <li>If you configure both remote and local directories, the remote directory is only used
+          when the local directories are fully utilized.</li>
+        <li>The size of a remote intermediate file could affect the query performance, and the value
+          can be set by “remote_tmp_file_size” in the start-up option. The default size of a remote
+          intermediate file is 16MB while the maximum is 512MB.</li>
+      </ul>
+      <p><b>Examples</b></p>
+      <ul>
+        <li>A hdfs scratch dir with one local buffer dir, file size 64MB. The space of hdfs scratch
+          dir is limited to 300G.
+          <codeblock>‑‑scratch_dirs="hdfs://ip_address:port_num/path:300G, /local_buffer_dir" ‑‑remote_tmp_file_size=64M</codeblock></li>
+        <li>A hdfs scratch dir with one local buffer dir, and one local dir. The space of hdfs
+          scratch dir is limited to 300G.
+          <codeblock>‑‑scratch_dirs="hdfs://ip_address:port_num/path:300G, /local_buffer_dir, /local_dir"</codeblock></li>
+        <li>A hdfs scratch dir with one local buffer dir, and multiple local dirs. The space of hdfs
+          scratch dir is unlimited.
+          <codeblock>‑‑scratch_dirs="hdfs://ip_address:port_num/path, /local_buffer_dir, /local_dir_1, /local_dir_2"</codeblock></li>
+      </ul>
+      <p>Even though max_bytes is optional it is highly recommended to configure for spilling to
+        HDFS because the HDFS cluster space is limited.</p>
+    </section>
+    <section>
+      <title>Configure Impala Daemon to spill to Ozone</title>
+      <p><b>Before you begin</b></p>
+      <ul>
+        <li>Identify the Ozone scratch directory where you want your new Impala to write the
+          temporary data.</li>
+        <li>Identify the port number of the Ozone scratch directory.</li>
+      </ul>
+      <p><b>Configuring the Start-up Option in Impala daemon</b></p>
+      <p>You can use the Impalad start option “scratch_dirs” to specify the locations of the
+        intermediate files.</p>
+      <codeblock>‑‑scratch_dirs=”ofs://ip_address:port_num(:max_bytes)(:priority), /local_buffer_dir” ‑‑remote_tmp_file_size=xM</codeblock>
+    </section>
   </conbody>
 
 </concept>


[impala] 02/04: IMPALA-11883: Calculate erasure-coded bytes read directly

Posted by mi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b858f2acdee77b4651ac84a7063fddf8fb0caae1
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Mon Jan 30 12:12:27 2023 -0800

    IMPALA-11883: Calculate erasure-coded bytes read directly
    
    Calculate the metric erasure-coded-bytes-read directly from HDFS reads
    rather than through hdfsFileGetReadStatistics. This allows us to use it
    for other filesystem implementations (Ozone).
    
    Also renumbers is_erasure_coded in THdfsFileSplit to 8, where it was
    originally before it was removed by IMPALA-9485 (and never replaced).
    
    Testing:
    - ran updated test_io_metrics.py with Ozone, with and without EC
    - ran updated test_io_metrics.py with HDFS, with and without EC
    
    Change-Id: Ide0fc806590b2328df8068a9a54645d1d1fb137c
    Reviewed-on: http://gerrit.cloudera.org:8080/19460
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Michael Smith <mi...@cloudera.com>
---
 be/src/runtime/io/hdfs-file-reader.cc | 7 ++++---
 be/src/runtime/io/request-context.h   | 2 +-
 common/thrift/PlanNodes.thrift        | 6 +++---
 tests/query_test/test_io_metrics.py   | 3 +--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/be/src/runtime/io/hdfs-file-reader.cc b/be/src/runtime/io/hdfs-file-reader.cc
index d880264c5..1d814dcc8 100644
--- a/be/src/runtime/io/hdfs-file-reader.cc
+++ b/be/src/runtime/io/hdfs-file-reader.cc
@@ -238,6 +238,10 @@ Status HdfsFileReader::ReadFromPos(DiskQueue* queue, int64_t file_offset, uint8_
       bool is_first_read = (num_remote_bytes_ == 0);
       // Collect and accumulate statistics
       GetHdfsStatistics(hdfs_file, log_slow_read);
+      if (scan_range_->is_erasure_coded()) {
+        scan_range_->reader_->bytes_read_ec_.Add(current_bytes_read);
+      }
+
       if (FLAGS_fs_trace_remote_reads && expected_local_ &&
           num_remote_bytes_ > 0 && is_first_read) {
         // Only log the first unexpected remote read for scan range
@@ -404,9 +408,6 @@ void HdfsFileReader::GetHdfsStatistics(hdfsFile hdfs_file, bool log_stats) {
       scan_range_->reader_->bytes_read_short_circuit_.Add(
           stats->totalShortCircuitBytesRead);
       scan_range_->reader_->bytes_read_dn_cache_.Add(stats->totalZeroCopyBytesRead);
-      if (scan_range_->is_erasure_coded()) {
-        scan_range_->reader_->bytes_read_ec_.Add(stats->totalBytesRead);
-      }
       if (stats->totalLocalBytesRead != stats->totalBytesRead) {
         num_remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead;
       }
diff --git a/be/src/runtime/io/request-context.h b/be/src/runtime/io/request-context.h
index 646908374..585a28f47 100644
--- a/be/src/runtime/io/request-context.h
+++ b/be/src/runtime/io/request-context.h
@@ -400,7 +400,7 @@ class RequestContext {
   /// Total number of bytes read from date node cache, updated at end of each range scan
   AtomicInt64 bytes_read_dn_cache_{0};
 
-  /// Total number of erasure-coded bytes read, updated at end of each range scan
+  /// Total number of erasure-coded bytes read
   AtomicInt64 bytes_read_ec_{0};
 
   /// Total number of bytes from remote reads that were expected to be local.
diff --git a/common/thrift/PlanNodes.thrift b/common/thrift/PlanNodes.thrift
index d76a25fa6..86519e4d1 100644
--- a/common/thrift/PlanNodes.thrift
+++ b/common/thrift/PlanNodes.thrift
@@ -218,6 +218,9 @@ struct THdfsFileSplit {
   // last modified time of the file
   7: required i64 mtime
 
+  // Whether the HDFS file is stored with erasure coding.
+  8: optional bool is_erasure_coded
+
   // Hash of the partition's path. This must be hashed with a hash algorithm that is
   // consistent across different processes and machines. This is currently using
   // Java's String.hashCode(), which is consistent. For testing purposes, this can use
@@ -227,9 +230,6 @@ struct THdfsFileSplit {
   // The absolute path of the file, it's used only when data files are outside of
   // the Iceberg table location (IMPALA-11507).
   10: optional string absolute_path
-
-  // Whether the HDFS file is stored with erasure coding.
-  11: optional bool is_erasure_coded
 }
 
 // Key range for single THBaseScanNode. Corresponds to HBaseKeyRangePB and should be kept
diff --git a/tests/query_test/test_io_metrics.py b/tests/query_test/test_io_metrics.py
index f75d14997..b45a0f8e7 100644
--- a/tests/query_test/test_io_metrics.py
+++ b/tests/query_test/test_io_metrics.py
@@ -47,8 +47,7 @@ class TestIOMetrics(ImpalaTestSuite):
     def append_metric(metric, expect_nonzero):
       (expect_nonzero_metrics if expect_nonzero else expect_zero_metrics).append(metric)
 
-    # IMPALA-11697: these come from getReadStatistics, which is only implemented for HDFS
-    append_metric("impala-server.io-mgr.erasure-coded-bytes-read", IS_HDFS and IS_EC)
+    append_metric("impala-server.io-mgr.erasure-coded-bytes-read", IS_EC)
     append_metric("impala-server.io-mgr.short-circuit-bytes-read",
         IS_HDFS and not IS_DOCKERIZED_TEST_CLUSTER)
     append_metric("impala-server.io-mgr.local-bytes-read",


[impala] 03/04: IMPALA-11859: Add bytes-read-encrypted metric

Posted by mi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 8935c759042ef14e9138f9bc28adc3cede4ee96f
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Mon Jan 30 12:12:27 2023 -0800

    IMPALA-11859: Add bytes-read-encrypted metric
    
    Adds a metric bytes-read-encrypted to track encrypted reads.
    
    Testing:
    - ran test_io_metrics.py with Ozone (encrypts by default)
    - ran test_io_metrics.py with HDFS (no encryption)
    
    Change-Id: I9dbc194a4bc31cb0e01545fb6032a0853db60f34
    Reviewed-on: http://gerrit.cloudera.org:8080/19461
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/hdfs-scan-node-base.cc                 |  7 +++++
 be/src/exec/hdfs-scan-node-base.h                  |  7 ++++-
 be/src/exec/orc/hdfs-orc-scanner.cc                |  5 ++--
 be/src/runtime/io/hdfs-file-reader.cc              |  3 +++
 be/src/runtime/io/request-context.h                |  4 +++
 be/src/runtime/io/request-ranges.h                 |  7 ++++-
 be/src/runtime/io/scan-range.cc                    |  1 +
 be/src/scheduling/scheduler.cc                     |  2 ++
 be/src/util/impalad-metrics.cc                     |  5 ++++
 be/src/util/impalad-metrics.h                      |  4 +++
 common/fbs/CatalogObjects.fbs                      |  3 +++
 common/protobuf/planner.proto                      |  3 +++
 common/thrift/PlanNodes.thrift                     |  3 +++
 common/thrift/metrics.json                         | 10 ++++++++
 .../org/apache/impala/catalog/FeIcebergTable.java  |  5 ++--
 .../apache/impala/catalog/FileMetadataLoader.java  |  4 +--
 .../org/apache/impala/catalog/HdfsPartition.java   | 15 ++++++-----
 .../java/org/apache/impala/compat/HdfsShim.java    | 30 ----------------------
 .../org/apache/impala/planner/HdfsScanNode.java    |  1 +
 tests/query_test/test_io_metrics.py                |  3 ++-
 20 files changed, 76 insertions(+), 46 deletions(-)

diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 123bfc250..ca3fab885 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -111,6 +111,8 @@ PROFILE_DEFINE_COUNTER(BytesReadShortCircuit, STABLE_LOW, TUnit::BYTES,
     "The total number of bytes read via short circuit read");
 PROFILE_DEFINE_COUNTER(BytesReadDataNodeCache, STABLE_HIGH, TUnit::BYTES,
     "The total number of bytes read from data node cache");
+PROFILE_DEFINE_COUNTER(BytesReadEncrypted, STABLE_LOW, TUnit::BYTES,
+    "The total number of bytes read from encrypted data");
 PROFILE_DEFINE_COUNTER(BytesReadErasureCoded, STABLE_LOW, TUnit::BYTES,
     "The total number of bytes read from erasure-coded data");
 PROFILE_DEFINE_COUNTER(RemoteScanRanges, STABLE_HIGH, TUnit::UNIT,
@@ -301,6 +303,7 @@ Status HdfsScanPlanNode::ProcessScanRangesAndInitSharedState(FragmentState* stat
         file_desc->file_length = split.file_length();
         file_desc->mtime = split.mtime();
         file_desc->file_compression = CompressionTypePBToThrift(split.file_compression());
+        file_desc->is_encrypted = split.is_encrypted();
         file_desc->is_erasure_coded = split.is_erasure_coded();
         file_desc->file_metadata = file_metadata;
         if (file_metadata) {
@@ -629,6 +632,7 @@ Status HdfsScanNodeBase::Open(RuntimeState* state) {
   bytes_read_short_circuit_ =
       PROFILE_BytesReadShortCircuit.Instantiate(runtime_profile());
   bytes_read_dn_cache_ = PROFILE_BytesReadDataNodeCache.Instantiate(runtime_profile());
+  bytes_read_encrypted_ = PROFILE_BytesReadEncrypted.Instantiate(runtime_profile());
   bytes_read_ec_ = PROFILE_BytesReadErasureCoded.Instantiate(runtime_profile());
   num_remote_ranges_ = PROFILE_RemoteScanRanges.Instantiate(runtime_profile());
   unexpected_remote_bytes_ =
@@ -1236,6 +1240,7 @@ void HdfsScanNodeBase::StopAndFinalizeCounters() {
     bytes_read_local_->Set(reader_context_->bytes_read_local());
     bytes_read_short_circuit_->Set(reader_context_->bytes_read_short_circuit());
     bytes_read_dn_cache_->Set(reader_context_->bytes_read_dn_cache());
+    bytes_read_encrypted_->Set(reader_context_->bytes_read_encrypted());
     bytes_read_ec_->Set(reader_context_->bytes_read_ec());
     num_remote_ranges_->Set(reader_context_->num_remote_ranges());
     unexpected_remote_bytes_->Set(reader_context_->unexpected_remote_bytes());
@@ -1261,6 +1266,8 @@ void HdfsScanNodeBase::StopAndFinalizeCounters() {
         bytes_read_short_circuit_->value());
     ImpaladMetrics::IO_MGR_CACHED_BYTES_READ->Increment(
         bytes_read_dn_cache_->value());
+    ImpaladMetrics::IO_MGR_ENCRYPTED_BYTES_READ->Increment(
+        bytes_read_encrypted_->value());
     ImpaladMetrics::IO_MGR_ERASURE_CODED_BYTES_READ->Increment(
         bytes_read_ec_->value());
   }
diff --git a/be/src/exec/hdfs-scan-node-base.h b/be/src/exec/hdfs-scan-node-base.h
index 5c03f3e66..56a8157aa 100644
--- a/be/src/exec/hdfs-scan-node-base.h
+++ b/be/src/exec/hdfs-scan-node-base.h
@@ -72,7 +72,8 @@ struct HdfsFileDesc {
       file_format(THdfsFileFormat::TEXT) {}
 
   io::ScanRange::FileInfo GetFileInfo() const {
-    return io::ScanRange::FileInfo{filename.c_str(), fs, mtime, is_erasure_coded};
+    return io::ScanRange::FileInfo{
+        filename.c_str(), fs, mtime, is_encrypted, is_erasure_coded};
   }
 
   /// Connection to the filesystem containing the file.
@@ -97,6 +98,9 @@ struct HdfsFileDesc {
   /// Extra file metadata, e.g. Iceberg-related file-level info.
   const ::org::apache::impala::fb::FbFileMetadata* file_metadata;
 
+  /// Whether file is encrypted.
+  bool is_encrypted = false;
+
   /// Whether file is erasure coded.
   bool is_erasure_coded = false;
 
@@ -789,6 +793,7 @@ class HdfsScanNodeBase : public ScanNode {
   RuntimeProfile::Counter* bytes_read_local_ = nullptr;
   RuntimeProfile::Counter* bytes_read_short_circuit_ = nullptr;
   RuntimeProfile::Counter* bytes_read_dn_cache_ = nullptr;
+  RuntimeProfile::Counter* bytes_read_encrypted_ = nullptr;
   RuntimeProfile::Counter* bytes_read_ec_ = nullptr;
   RuntimeProfile::Counter* num_remote_ranges_ = nullptr;
   RuntimeProfile::Counter* unexpected_remote_bytes_ = nullptr;
diff --git a/be/src/exec/orc/hdfs-orc-scanner.cc b/be/src/exec/orc/hdfs-orc-scanner.cc
index f3050ff97..a4193dca5 100644
--- a/be/src/exec/orc/hdfs-orc-scanner.cc
+++ b/be/src/exec/orc/hdfs-orc-scanner.cc
@@ -156,7 +156,8 @@ Status HdfsOrcScanner::ScanRangeInputStream::readRandom(
   int cache_options = split_range->cache_options() & ~BufferOpts::USE_HDFS_CACHE;
   ScanRange* range = scanner_->scan_node_->AllocateScanRange(
       ScanRange::FileInfo{scanner_->filename(), metadata_range->fs(),
-          split_range->mtime(), split_range->is_erasure_coded()},
+          split_range->mtime(), split_range->is_encrypted(),
+          split_range->is_erasure_coded()},
       length, offset, partition_id, split_range->disk_id(), expected_local,
       BufferOpts::ReadInto(reinterpret_cast<uint8_t*>(buf), length, cache_options));
   unique_ptr<BufferDescriptor> io_buffer;
@@ -258,7 +259,7 @@ Status HdfsOrcScanner::StartColumnReading(const orc::StripeInformation& stripe)
     }
     ScanRange* scan_range = scan_node_->AllocateScanRange(
         ScanRange::FileInfo{filename(), metadata_range_->fs(), split_range->mtime(),
-            split_range->is_erasure_coded()},
+            split_range->is_encrypted(), split_range->is_erasure_coded()},
         range.length_, range.offset_, partition_id, split_range->disk_id(),
         col_range_local, BufferOpts(split_range->cache_options()));
     RETURN_IF_ERROR(
diff --git a/be/src/runtime/io/hdfs-file-reader.cc b/be/src/runtime/io/hdfs-file-reader.cc
index 1d814dcc8..c1d8c70f5 100644
--- a/be/src/runtime/io/hdfs-file-reader.cc
+++ b/be/src/runtime/io/hdfs-file-reader.cc
@@ -238,6 +238,9 @@ Status HdfsFileReader::ReadFromPos(DiskQueue* queue, int64_t file_offset, uint8_
       bool is_first_read = (num_remote_bytes_ == 0);
       // Collect and accumulate statistics
       GetHdfsStatistics(hdfs_file, log_slow_read);
+      if (scan_range_->is_encrypted()) {
+        scan_range_->reader_->bytes_read_encrypted_.Add(current_bytes_read);
+      }
       if (scan_range_->is_erasure_coded()) {
         scan_range_->reader_->bytes_read_ec_.Add(current_bytes_read);
       }
diff --git a/be/src/runtime/io/request-context.h b/be/src/runtime/io/request-context.h
index 585a28f47..8ecc2b7ee 100644
--- a/be/src/runtime/io/request-context.h
+++ b/be/src/runtime/io/request-context.h
@@ -158,6 +158,7 @@ class RequestContext {
   int64_t bytes_read_local() const { return bytes_read_local_.Load(); }
   int64_t bytes_read_short_circuit() const { return bytes_read_short_circuit_.Load(); }
   int64_t bytes_read_dn_cache() const { return bytes_read_dn_cache_.Load(); }
+  int64_t bytes_read_encrypted() const { return bytes_read_encrypted_.Load(); }
   int64_t bytes_read_ec() const { return bytes_read_ec_.Load(); }
   int num_remote_ranges() const { return num_remote_ranges_.Load(); }
   int64_t unexpected_remote_bytes() const { return unexpected_remote_bytes_.Load(); }
@@ -400,6 +401,9 @@ class RequestContext {
   /// Total number of bytes read from date node cache, updated at end of each range scan
   AtomicInt64 bytes_read_dn_cache_{0};
 
+  /// Total number of encrypted bytes read
+  AtomicInt64 bytes_read_encrypted_{0};
+
   /// Total number of erasure-coded bytes read
   AtomicInt64 bytes_read_ec_{0};
 
diff --git a/be/src/runtime/io/request-ranges.h b/be/src/runtime/io/request-ranges.h
index 01bbe2c01..2e5e9a38e 100644
--- a/be/src/runtime/io/request-ranges.h
+++ b/be/src/runtime/io/request-ranges.h
@@ -146,6 +146,7 @@ class RequestRange : public InternalQueue<RequestRange>::Node {
   int64_t offset() const { return offset_; }
   int64_t len() const { return len_; }
   int disk_id() const { return disk_id_; }
+  bool is_encrypted() const { return is_encrypted_; }
   bool is_erasure_coded() const { return is_erasure_coded_; }
   RequestType::type request_type() const { return request_type_; }
 
@@ -172,6 +173,9 @@ class RequestRange : public InternalQueue<RequestRange>::Node {
   /// Id of disk queue containing byte range.
   int disk_id_;
 
+  /// Whether file is encrypted.
+  bool is_encrypted_;
+
   /// Whether file is erasure coded.
   bool is_erasure_coded_;
 
@@ -272,6 +276,7 @@ class ScanRange : public RequestRange {
     const char *filename;
     hdfsFS fs = nullptr;
     int64_t mtime = ScanRange::INVALID_MTIME;
+    bool is_encrypted = false;
     bool is_erasure_coded = false;
   };
 
@@ -283,7 +288,7 @@ class ScanRange : public RequestRange {
 
   /// Get file info for the current scan range.
   FileInfo GetFileInfo() const {
-    return FileInfo{file_.c_str(), fs_, mtime_, is_erasure_coded_};
+    return FileInfo{file_.c_str(), fs_, mtime_, is_encrypted_, is_erasure_coded_};
   }
 
   /// Resets this scan range object with the scan range description. The scan range
diff --git a/be/src/runtime/io/scan-range.cc b/be/src/runtime/io/scan-range.cc
index 693807358..5dd5eae8f 100644
--- a/be/src/runtime/io/scan-range.cc
+++ b/be/src/runtime/io/scan-range.cc
@@ -511,6 +511,7 @@ void ScanRange::Reset(const FileInfo &fi, int64_t len, int64_t offset, int disk_
   bytes_to_read_ = len;
   offset_ = offset;
   disk_id_ = disk_id;
+  is_encrypted_ = fi.is_encrypted;
   is_erasure_coded_ = fi.is_erasure_coded;
   cache_options_ = buffer_opts.cache_options_;
   disk_file_ = disk_file;
diff --git a/be/src/scheduling/scheduler.cc b/be/src/scheduling/scheduler.cc
index 67ea5414c..947f234f2 100644
--- a/be/src/scheduling/scheduler.cc
+++ b/be/src/scheduling/scheduler.cc
@@ -134,6 +134,7 @@ Status Scheduler::GenerateScanRanges(const vector<TFileSplitGeneratorSpec>& spec
       hdfs_scan_range.__set_offset(scan_range_offset);
       hdfs_scan_range.__set_partition_id(spec.partition_id);
       hdfs_scan_range.__set_partition_path_hash(spec.partition_path_hash);
+      hdfs_scan_range.__set_is_encrypted(fb_desc->is_encrypted());
       hdfs_scan_range.__set_is_erasure_coded(fb_desc->is_ec());
       if (fb_desc->absolute_path() != nullptr) {
         hdfs_scan_range.__set_absolute_path(fb_desc->absolute_path()->str());
@@ -1124,6 +1125,7 @@ void TScanRangeToScanRangePB(const TScanRange& tscan_range, ScanRangePB* scan_ra
     hdfs_file_split->set_mtime(tscan_range.hdfs_file_split.mtime);
     hdfs_file_split->set_partition_path_hash(
         tscan_range.hdfs_file_split.partition_path_hash);
+    hdfs_file_split->set_is_encrypted(tscan_range.hdfs_file_split.is_encrypted);
     hdfs_file_split->set_is_erasure_coded(tscan_range.hdfs_file_split.is_erasure_coded);
     if (tscan_range.hdfs_file_split.__isset.absolute_path) {
       hdfs_file_split->set_absolute_path(
diff --git a/be/src/util/impalad-metrics.cc b/be/src/util/impalad-metrics.cc
index 6a2e07402..cb88adca5 100644
--- a/be/src/util/impalad-metrics.cc
+++ b/be/src/util/impalad-metrics.cc
@@ -59,6 +59,8 @@ const char* ImpaladMetricKeys::IO_MGR_SHORT_CIRCUIT_BYTES_READ =
     "impala-server.io-mgr.short-circuit-bytes-read";
 const char* ImpaladMetricKeys::IO_MGR_CACHED_BYTES_READ =
     "impala-server.io-mgr.cached-bytes-read";
+const char* ImpaladMetricKeys::IO_MGR_ENCRYPTED_BYTES_READ =
+    "impala-server.io-mgr.encrypted-bytes-read";
 const char* ImpaladMetricKeys::IO_MGR_ERASURE_CODED_BYTES_READ =
     "impala-server.io-mgr.erasure-coded-bytes-read";
 const char* ImpaladMetricKeys::IO_MGR_REMOTE_DATA_CACHE_HIT_BYTES =
@@ -169,6 +171,7 @@ IntCounter* ImpaladMetrics::IO_MGR_BYTES_READ = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_LOCAL_BYTES_READ = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_SHORT_CIRCUIT_BYTES_READ = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_CACHED_BYTES_READ = nullptr;
+IntCounter* ImpaladMetrics::IO_MGR_ENCRYPTED_BYTES_READ = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_ERASURE_CODED_BYTES_READ = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_REMOTE_DATA_CACHE_HIT_BYTES = nullptr;
 IntCounter* ImpaladMetrics::IO_MGR_REMOTE_DATA_CACHE_HIT_COUNT = nullptr;
@@ -344,6 +347,8 @@ void ImpaladMetrics::CreateMetrics(MetricGroup* m) {
       ImpaladMetricKeys::IO_MGR_LOCAL_BYTES_READ, 0);
   IO_MGR_CACHED_BYTES_READ = IO_MGR_METRICS->AddCounter(
       ImpaladMetricKeys::IO_MGR_CACHED_BYTES_READ, 0);
+  IO_MGR_ENCRYPTED_BYTES_READ = IO_MGR_METRICS->AddCounter(
+      ImpaladMetricKeys::IO_MGR_ENCRYPTED_BYTES_READ, 0);
   IO_MGR_ERASURE_CODED_BYTES_READ = IO_MGR_METRICS->AddCounter(
       ImpaladMetricKeys::IO_MGR_ERASURE_CODED_BYTES_READ, 0);
   IO_MGR_SHORT_CIRCUIT_BYTES_READ = IO_MGR_METRICS->AddCounter(
diff --git a/be/src/util/impalad-metrics.h b/be/src/util/impalad-metrics.h
index c43dca69f..08131da30 100644
--- a/be/src/util/impalad-metrics.h
+++ b/be/src/util/impalad-metrics.h
@@ -76,6 +76,9 @@ class ImpaladMetricKeys {
   /// Total number of cached bytes read by the io mgr
   static const char* IO_MGR_CACHED_BYTES_READ;
 
+  /// Total number of encrypted bytes read by the io mgr
+  static const char* IO_MGR_ENCRYPTED_BYTES_READ;
+
   /// Total number of erasure-coded bytes read by the io mgr
   static const char* IO_MGR_ERASURE_CODED_BYTES_READ;
 
@@ -261,6 +264,7 @@ class ImpaladMetrics {
   static IntCounter* IO_MGR_BYTES_READ;
   static IntCounter* IO_MGR_LOCAL_BYTES_READ;
   static IntCounter* IO_MGR_CACHED_BYTES_READ;
+  static IntCounter* IO_MGR_ENCRYPTED_BYTES_READ;
   static IntCounter* IO_MGR_ERASURE_CODED_BYTES_READ;
   static IntCounter* IO_MGR_REMOTE_DATA_CACHE_HIT_BYTES;
   static IntCounter* IO_MGR_REMOTE_DATA_CACHE_HIT_COUNT;
diff --git a/common/fbs/CatalogObjects.fbs b/common/fbs/CatalogObjects.fbs
index 8ecfb2f11..f0dc08824 100644
--- a/common/fbs/CatalogObjects.fbs
+++ b/common/fbs/CatalogObjects.fbs
@@ -83,6 +83,9 @@ table FbFileDesc {
   // The absolute path of the file, it's used only when data files are outside of
   // the Iceberg table location (IMPALA-11507).
   absolute_path: string (id: 6);
+
+  // Whether this file is encrypted
+  is_encrypted: bool = false (id: 7);
 }
 
 // Additional file-related metadata
diff --git a/common/protobuf/planner.proto b/common/protobuf/planner.proto
index 4e7c8ac63..26b4cd2fc 100644
--- a/common/protobuf/planner.proto
+++ b/common/protobuf/planner.proto
@@ -58,6 +58,9 @@ message HdfsFileSplitPB {
   // The absolute path of the file, it's used only when data files are outside of
   // the Iceberg table location (IMPALA-11507).
   optional string absolute_path = 10;
+
+  // Whether this file is encrypted.
+  optional bool is_encrypted = 11;
 }
 
 // Key range for single THBaseScanNode. Corresponds to THBaseKeyRange and should be kept
diff --git a/common/thrift/PlanNodes.thrift b/common/thrift/PlanNodes.thrift
index 86519e4d1..6e8e370be 100644
--- a/common/thrift/PlanNodes.thrift
+++ b/common/thrift/PlanNodes.thrift
@@ -230,6 +230,9 @@ struct THdfsFileSplit {
   // The absolute path of the file, it's used only when data files are outside of
   // the Iceberg table location (IMPALA-11507).
   10: optional string absolute_path
+
+  // Whether the HDFS file is stored with transparent data encryption.
+  11: optional bool is_encrypted
 }
 
 // Key range for single THBaseScanNode. Corresponds to HBaseKeyRangePB and should be kept
diff --git a/common/thrift/metrics.json b/common/thrift/metrics.json
index 5f1a68e71..403724480 100644
--- a/common/thrift/metrics.json
+++ b/common/thrift/metrics.json
@@ -489,6 +489,16 @@
     "kind": "COUNTER",
     "key": "impala-server.io-mgr.cached-bytes-read"
   },
+  {
+    "description": "Total number of encrypted bytes read by the IO manager.",
+    "contexts": [
+      "IMPALAD"
+    ],
+    "label": "Impala Server Io Mgr Encrypted Bytes Read",
+    "units": "BYTES",
+    "kind": "COUNTER",
+    "key": "impala-server.io-mgr.encrypted-bytes-read"
+  },
   {
     "description": "Total number of erasure-coded bytes read by the IO manager.",
     "contexts": [
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
index c9c489ab5..b3a66a0d8 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
@@ -67,7 +67,6 @@ import org.apache.impala.common.FileSystemUtil;
 import org.apache.impala.common.Pair;
 import org.apache.impala.common.PrintUtils;
 import org.apache.impala.common.Reference;
-import org.apache.impala.compat.HdfsShim;
 import org.apache.impala.fb.FbFileBlock;
 import org.apache.impala.thrift.TColumn;
 import org.apache.impala.thrift.TCompressionCodec;
@@ -679,8 +678,8 @@ public interface FeIcebergTable extends FeFsTable {
       }
 
       return HdfsPartition.FileDescriptor.create(fileStatus, relPath, locations,
-          table.getHostIndex(), HdfsShim.isErasureCoded(fileStatus), numUnknownDiskIds,
-          absPath);
+          table.getHostIndex(), fileStatus.isEncrypted(), fileStatus.isErasureCoded(),
+          numUnknownDiskIds, absPath);
     }
 
     /**
diff --git a/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java b/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
index 9ebc6fa1f..9a514b05d 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
@@ -33,7 +33,6 @@ import org.apache.hadoop.hive.common.ValidWriteIdList;
 import org.apache.impala.catalog.HdfsPartition.FileDescriptor;
 import org.apache.impala.common.FileSystemUtil;
 import org.apache.impala.common.Reference;
-import org.apache.impala.compat.HdfsShim;
 import org.apache.impala.thrift.TNetworkAddress;
 import org.apache.impala.util.AcidUtils;
 import org.apache.impala.util.HudiUtil;
@@ -291,7 +290,8 @@ public class FileMetadataLoader {
       locations = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
     }
     return FileDescriptor.create(fileStatus, relPath, locations, hostIndex_,
-        HdfsShim.isErasureCoded(fileStatus), numUnknownDiskIds, absPath);
+        fileStatus.isEncrypted(), fileStatus.isErasureCoded(), numUnknownDiskIds,
+        absPath);
   }
 
   private FileDescriptor createFd(FileSystem fs, FileStatus fileStatus,
diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
index 02e285ba5..034790fd9 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
@@ -191,8 +191,9 @@ public class HdfsPartition extends CatalogObjectImpl
      *                          for which no disk ID could be determined
      */
     public static FileDescriptor create(FileStatus fileStatus, String relPath,
-        BlockLocation[] blockLocations, ListMap<TNetworkAddress> hostIndex, boolean isEc,
-        Reference<Long> numUnknownDiskIds, String absPath) throws IOException {
+        BlockLocation[] blockLocations, ListMap<TNetworkAddress> hostIndex,
+        boolean isEncrypted, boolean isEc, Reference<Long> numUnknownDiskIds,
+        String absPath) throws IOException {
       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
       int[] fbFileBlockOffsets = new int[blockLocations.length];
       int blockIdx = 0;
@@ -207,7 +208,7 @@ public class HdfsPartition extends CatalogObjectImpl
         }
       }
       return new FileDescriptor(createFbFileDesc(fbb, fileStatus, relPath,
-          fbFileBlockOffsets, isEc, absPath));
+          fbFileBlockOffsets, isEncrypted, isEc, absPath));
     }
 
     /**
@@ -218,7 +219,7 @@ public class HdfsPartition extends CatalogObjectImpl
         FileStatus fileStatus, String relPath, String absPath) {
       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
       return new FileDescriptor(
-          createFbFileDesc(fbb, fileStatus, relPath, null, false, absPath));
+          createFbFileDesc(fbb, fileStatus, relPath, null, false, false, absPath));
     }
     /**
      * Serializes the metadata of a file descriptor represented by 'fileStatus' into a
@@ -227,8 +228,8 @@ public class HdfsPartition extends CatalogObjectImpl
      * in the underlying buffer. Can be null if there are no blocks.
      */
     private static FbFileDesc createFbFileDesc(FlatBufferBuilder fbb,
-        FileStatus fileStatus, String relPath, int[] fbFileBlockOffsets, boolean isEc,
-        String absPath) {
+        FileStatus fileStatus, String relPath, int[] fbFileBlockOffsets,
+        boolean isEncrypted, boolean isEc, String absPath) {
       int relPathOffset = fbb.createString(relPath == null ? StringUtils.EMPTY : relPath);
       // A negative block vector offset is used when no block offsets are specified.
       int blockVectorOffset = -1;
@@ -242,6 +243,7 @@ public class HdfsPartition extends CatalogObjectImpl
       FbFileDesc.addRelativePath(fbb, relPathOffset);
       FbFileDesc.addLength(fbb, fileStatus.getLen());
       FbFileDesc.addLastModificationTime(fbb, fileStatus.getModificationTime());
+      FbFileDesc.addIsEncrypted(fbb, isEncrypted);
       FbFileDesc.addIsEc(fbb, isEc);
       HdfsCompression comp = HdfsCompression.fromFileName(fileStatus.getPath().getName());
       FbFileDesc.addCompression(fbb, comp.toFb());
@@ -288,6 +290,7 @@ public class HdfsPartition extends CatalogObjectImpl
 
     public long getModificationTime() { return fbFileDescriptor_.lastModificationTime(); }
     public int getNumFileBlocks() { return fbFileDescriptor_.fileBlocksLength(); }
+    public boolean getIsEncrypted() {return fbFileDescriptor_.isEncrypted(); }
     public boolean getIsEc() {return fbFileDescriptor_.isEc(); }
 
     public FbFileBlock getFbFileBlock(int idx) {
diff --git a/fe/src/main/java/org/apache/impala/compat/HdfsShim.java b/fe/src/main/java/org/apache/impala/compat/HdfsShim.java
deleted file mode 100644
index 9453f80a8..000000000
--- a/fe/src/main/java/org/apache/impala/compat/HdfsShim.java
+++ /dev/null
@@ -1,30 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-package org.apache.impala.compat;
-
-import org.apache.hadoop.fs.FileStatus;
-
-/**
- * Wrapper classes to abstract away differences between HDFS versions in
- * the MiniCluster profiles.
- */
-public class HdfsShim {
-  public static boolean isErasureCoded(FileStatus fileStatus) {
-    return fileStatus.isErasureCoded();
-  }
-}
diff --git a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
index d638ea1a1..3d8243f4d 100644
--- a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
@@ -1427,6 +1427,7 @@ public class HdfsScanNode extends ScanNode {
             fileDesc.getFileCompression().toThrift(), fileDesc.getModificationTime(),
             partition.getLocation().hashCode());
         hdfsFileSplit.setAbsolute_path(fileDesc.getAbsolutePath());
+        hdfsFileSplit.setIs_encrypted(fileDesc.getIsEncrypted());
         hdfsFileSplit.setIs_erasure_coded(fileDesc.getIsEc());
         scanRange.setHdfs_file_split(hdfsFileSplit);
         if (fileDesc.getFbFileMetadata() != null) {
diff --git a/tests/query_test/test_io_metrics.py b/tests/query_test/test_io_metrics.py
index b45a0f8e7..5388f85b6 100644
--- a/tests/query_test/test_io_metrics.py
+++ b/tests/query_test/test_io_metrics.py
@@ -20,7 +20,7 @@ import pytest
 from tests.common.environ import IS_DOCKERIZED_TEST_CLUSTER
 from tests.common.impala_test_suite import ImpalaTestSuite, LOG
 from tests.common.test_dimensions import create_single_exec_option_dimension
-from tests.util.filesystem_utils import IS_EC, IS_HDFS
+from tests.util.filesystem_utils import IS_EC, IS_HDFS, IS_ENCRYPTED
 
 
 class TestIOMetrics(ImpalaTestSuite):
@@ -47,6 +47,7 @@ class TestIOMetrics(ImpalaTestSuite):
     def append_metric(metric, expect_nonzero):
       (expect_nonzero_metrics if expect_nonzero else expect_zero_metrics).append(metric)
 
+    append_metric("impala-server.io-mgr.encrypted-bytes-read", IS_ENCRYPTED)
     append_metric("impala-server.io-mgr.erasure-coded-bytes-read", IS_EC)
     append_metric("impala-server.io-mgr.short-circuit-bytes-read",
         IS_HDFS and not IS_DOCKERIZED_TEST_CLUSTER)


[impala] 04/04: IMPALA-11892: Restore pkg_resources with Python 2

Posted by mi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit eca34cc9817397f0d6935b72d37974fafda85426
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Mon Jan 23 16:03:57 2023 -0800

    IMPALA-11892: Restore pkg_resources with Python 2
    
    Impala's shell tarball used to include a copy of pkg_resources.py (from
    setuptools); due to the Python version we use for packaging, all modules
    with native libraries use pkg_resources to load the library. It was
    removed in IMPALA-9718 because Impala's copy of pkg_resources didn't
    work with Python 3.
    
    Some platforms - RHEL 7 for Python 2, Ubuntu/Debian - don't install
    setuptools by default as part of the python package, which causes
    impala-shell to error with "ImportError: No module named pkg_resources".
    
    Restores Impala's copy of pkg_resources.py to PYTHONPATH when running
    impala-shell under Python 2. Omits it for Python 3 so we use updated
    setuptools when available. python-setuptools will still be a manual
    requirement with Python 3.
    
    Testing
    - manually confirmed impala-shell starts in Ubuntu 20.04 docker
      container after 'apt install python' (omits setuptools).
    - manually confirmed impala-shell starts in Ubuntu 20.04 docker
      container after 'apt install python3-setuptools' (includes python).
    
    Change-Id: I78c05bce75ecc68de2296b1c2e57cd3c17c3cb0a
    Reviewed-on: http://gerrit.cloudera.org:8080/19467
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 shell/impala-shell          |    6 +-
 shell/make_shell_tarball.sh |    6 +-
 shell/pkg_resources.py      | 2700 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 2710 insertions(+), 2 deletions(-)

diff --git a/shell/impala-shell b/shell/impala-shell
index f41d36f71..651db863d 100755
--- a/shell/impala-shell
+++ b/shell/impala-shell
@@ -59,5 +59,9 @@ for EGG in $(ls ${SHELL_HOME}/ext-py${PYTHON_VERSION}/*.egg); do
    EGG_PATH="${EGG}:${EGG_PATH}"
 done
 
-PYTHONPATH="${EGG_PATH}${SHELL_HOME}/gen-py:${SHELL_HOME}/lib:${PYTHONPATH}" \
+LEGACY=
+if [ ${PYTHON_VERSION} -eq 2 ]; then
+  LEGACY=":${SHELL_HOME}/legacy"
+fi
+PYTHONPATH="${EGG_PATH}${SHELL_HOME}/gen-py:${SHELL_HOME}/lib:${PYTHONPATH}${LEGACY}" \
   PYTHONIOENCODING='utf-8' exec ${PYTHON_EXE} ${SHELL_HOME}/impala_shell.py "$@"
diff --git a/shell/make_shell_tarball.sh b/shell/make_shell_tarball.sh
index e50931000..6a2273b44 100755
--- a/shell/make_shell_tarball.sh
+++ b/shell/make_shell_tarball.sh
@@ -63,13 +63,15 @@ TARBALL_ROOT=${BUILD_DIR}/impala-shell-${VERSION}
 
 THRIFT_GEN_PY_DIR="${SHELL_HOME}/gen-py"
 
-echo "Deleting all files in ${TARBALL_ROOT}/{gen-py,lib,ext-py*}"
+echo "Deleting all files in ${TARBALL_ROOT}/{gen-py,lib,ext-py*,legacy}"
 rm -rf ${TARBALL_ROOT}/lib/* 2>&1 > /dev/null
 rm -rf ${TARBALL_ROOT}/gen-py/* 2>&1 > /dev/null
 rm -rf ${TARBALL_ROOT}/ext-py*/* 2>&1 > /dev/null
+rm -rf ${TARBALL_ROOT}/legacy/* 2>&1 > /dev/null
 mkdir -p ${TARBALL_ROOT}/lib
 mkdir -p ${TARBALL_ROOT}/ext-py2
 mkdir -p ${TARBALL_ROOT}/ext-py3
+mkdir -p ${TARBALL_ROOT}/legacy
 
 rm -f ${THRIFT_GEN_PY_DIR}/impala_build_version.py
 cat > ${THRIFT_GEN_PY_DIR}/impala_build_version.py <<EOF
@@ -158,6 +160,8 @@ cp ${SHELL_HOME}/impala_shell.py ${TARBALL_ROOT}
 cp ${SHELL_HOME}/compatibility.py ${TARBALL_ROOT}
 cp ${SHELL_HOME}/thrift_printer.py ${TARBALL_ROOT}
 
+cp ${SHELL_HOME}/pkg_resources.py ${TARBALL_ROOT}/legacy
+
 pushd ${BUILD_DIR} > /dev/null
 echo "Making tarball in ${BUILD_DIR}"
 tar czf ${BUILD_DIR}/impala-shell-${VERSION}.tar.gz --exclude="*.pyc" \
diff --git a/shell/pkg_resources.py b/shell/pkg_resources.py
new file mode 100644
index 000000000..70ecc44d5
--- /dev/null
+++ b/shell/pkg_resources.py
@@ -0,0 +1,2700 @@
+from __future__ import print_function, unicode_literals
+
+"""
+  This file is redistributed under the Python Software Foundation License:
+  http://docs.python.org/2/license.html
+"""
+
+"""Package resource API
+--------------------
+
+A resource is a logical file contained within a package, or a logical
+subdirectory thereof.  The package resource API expects resource names
+to have their path parts separated with ``/``, *not* whatever the local
+path separator is.  Do not use os.path operations to manipulate resource
+names being passed into the API.
+
+The package resource API is designed to work with normal filesystem packages,
+.egg files, and unpacked .egg files.  It can also work in a limited way with
+.zip files and with custom PEP 302 loaders that support the ``get_data()``
+method.
+"""
+
+import sys, os, zipimport, time, re, imp, types
+from urlparse import urlparse, urlunparse
+
+try:
+    frozenset
+except NameError:
+    from sets import ImmutableSet as frozenset
+
+# capture these to bypass sandboxing
+from os import utime
+try:
+    from os import mkdir, rename, unlink
+    WRITE_SUPPORT = True
+except ImportError:
+    # no write support, probably under GAE
+    WRITE_SUPPORT = False
+
+from os import open as os_open
+from os.path import isdir, split
+
+# This marker is used to simplify the process that checks is the
+# setuptools package was installed by the Setuptools project
+# or by the Distribute project, in case Setuptools creates
+# a distribution with the same version.
+#
+# The bootstrapping script for instance, will check if this
+# attribute is present to decide wether to reinstall the package
+_distribute = True
+
+def _bypass_ensure_directory(name, mode=0777):
+    # Sandbox-bypassing version of ensure_directory()
+    if not WRITE_SUPPORT:
+        raise IOError('"os.mkdir" not supported on this platform.')
+    dirname, filename = split(name)
+    if dirname and filename and not isdir(dirname):
+        _bypass_ensure_directory(dirname)
+        mkdir(dirname, mode)
+
+
+
+
+
+
+
+
+def get_supported_platform():
+    """Return this platform's maximum compatible version.
+
+    distutils.util.get_platform() normally reports the minimum version
+    of Mac OS X that would be required to *use* extensions produced by
+    distutils.  But what we want when checking compatibility is to know the
+    version of Mac OS X that we are *running*.  To allow usage of packages that
+    explicitly require a newer version of Mac OS X, we must also know the
+    current version of the OS.
+
+    If this condition occurs for any other platform with a version in its
+    platform strings, this function should be extended accordingly.
+    """
+    plat = get_build_platform(); m = macosVersionString.match(plat)
+    if m is not None and sys.platform == "darwin":
+        try:
+            plat = 'macosx-%s-%s' % ('.'.join(_macosx_vers()[:2]), m.group(3))
+        except ValueError:
+            pass    # not Mac OS X
+    return plat
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+__all__ = [
+    # Basic resource access and distribution/entry point discovery
+    'require', 'run_script', 'get_provider',  'get_distribution',
+    'load_entry_point', 'get_entry_map', 'get_entry_info', 'iter_entry_points',
+    'resource_string', 'resource_stream', 'resource_filename',
+    'resource_listdir', 'resource_exists', 'resource_isdir',
+
+    # Environmental control
+    'declare_namespace', 'working_set', 'add_activation_listener',
+    'find_distributions', 'set_extraction_path', 'cleanup_resources',
+    'get_default_cache',
+
+    # Primary implementation classes
+    'Environment', 'WorkingSet', 'ResourceManager',
+    'Distribution', 'Requirement', 'EntryPoint',
+
+    # Exceptions
+    'ResolutionError','VersionConflict','DistributionNotFound','UnknownExtra',
+    'ExtractionError',
+
+    # Parsing functions and string utilities
+    'parse_requirements', 'parse_version', 'safe_name', 'safe_version',
+    'get_platform', 'compatible_platforms', 'yield_lines', 'split_sections',
+    'safe_extra', 'to_filename',
+
+    # filesystem utilities
+    'ensure_directory', 'normalize_path',
+
+    # Distribution "precedence" constants
+    'EGG_DIST', 'BINARY_DIST', 'SOURCE_DIST', 'CHECKOUT_DIST', 'DEVELOP_DIST',
+
+    # "Provider" interfaces, implementations, and registration/lookup APIs
+    'IMetadataProvider', 'IResourceProvider', 'FileMetadata',
+    'PathMetadata', 'EggMetadata', 'EmptyProvider', 'empty_provider',
+    'NullProvider', 'EggProvider', 'DefaultProvider', 'ZipProvider',
+    'register_finder', 'register_namespace_handler', 'register_loader_type',
+    'fixup_namespace_packages', 'get_importer',
+
+    # Deprecated/backward compatibility only
+    'run_main', 'AvailableDistributions',
+]
+class ResolutionError(Exception):
+    """Abstract base for dependency resolution errors"""
+    def __repr__(self):
+        return self.__class__.__name__+repr(self.args)
+
+class VersionConflict(ResolutionError):
+    """An already-installed version conflicts with the requested version"""
+
+class DistributionNotFound(ResolutionError):
+    """A requested distribution was not found"""
+
+class UnknownExtra(ResolutionError):
+    """Distribution doesn't have an "extra feature" of the given name"""
+_provider_factories = {}
+
+PY_MAJOR = sys.version[:3]
+EGG_DIST    = 3
+BINARY_DIST = 2
+SOURCE_DIST = 1
+CHECKOUT_DIST = 0
+DEVELOP_DIST = -1
+
+def register_loader_type(loader_type, provider_factory):
+    """Register `provider_factory` to make providers for `loader_type`
+
+    `loader_type` is the type or class of a PEP 302 ``module.__loader__``,
+    and `provider_factory` is a function that, passed a *module* object,
+    returns an ``IResourceProvider`` for that module.
+    """
+    _provider_factories[loader_type] = provider_factory
+
+def get_provider(moduleOrReq):
+    """Return an IResourceProvider for the named module or requirement"""
+    if isinstance(moduleOrReq,Requirement):
+        return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
+    try:
+        module = sys.modules[moduleOrReq]
+    except KeyError:
+        __import__(moduleOrReq)
+        module = sys.modules[moduleOrReq]
+    loader = getattr(module, '__loader__', None)
+    return _find_adapter(_provider_factories, loader)(module)
+
+def _macosx_vers(_cache=[]):
+    if not _cache:
+        import platform
+        version = platform.mac_ver()[0]
+        # fallback for MacPorts
+        if version == '':
+            import plistlib
+            plist = '/System/Library/CoreServices/SystemVersion.plist'
+            if os.path.exists(plist):
+                if hasattr(plistlib, 'readPlist'):
+                    plist_content = plistlib.readPlist(plist)
+                    if 'ProductVersion' in plist_content:
+                        version = plist_content['ProductVersion']
+
+        _cache.append(version.split('.'))
+    return _cache[0]
+
+def _macosx_arch(machine):
+    return {'PowerPC':'ppc', 'Power_Macintosh':'ppc'}.get(machine,machine)
+
+def get_build_platform():
+    """Return this platform's string for platform-specific distributions
+
+    XXX Currently this is the same as ``distutils.util.get_platform()``, but it
+    needs some hacks for Linux and Mac OS X.
+    """
+    try:
+        from distutils.util import get_platform
+    except ImportError:
+        from sysconfig import get_platform
+
+    plat = get_platform()
+    if sys.platform == "darwin" and not plat.startswith('macosx-'):
+        try:
+            version = _macosx_vers()
+            machine = os.uname()[4].replace(" ", "_")
+            return "macosx-%d.%d-%s" % (int(version[0]), int(version[1]),
+                _macosx_arch(machine))
+        except ValueError:
+            # if someone is running a non-Mac darwin system, this will fall
+            # through to the default implementation
+            pass
+    return plat
+
+macosVersionString = re.compile(r"macosx-(\d+)\.(\d+)-(.*)")
+darwinVersionString = re.compile(r"darwin-(\d+)\.(\d+)\.(\d+)-(.*)")
+get_platform = get_build_platform   # XXX backward compat
+
+def compatible_platforms(provided,required):
+    """Can code for the `provided` platform run on the `required` platform?
+
+    Returns true if either platform is ``None``, or the platforms are equal.
+
+    XXX Needs compatibility checks for Linux and other unixy OSes.
+    """
+    if provided is None or required is None or provided==required:
+        return True     # easy case
+
+    # Mac OS X special cases
+    reqMac = macosVersionString.match(required)
+    if reqMac:
+        provMac = macosVersionString.match(provided)
+
+        # is this a Mac package?
+        if not provMac:
+            # this is backwards compatibility for packages built before
+            # setuptools 0.6. All packages built after this point will
+            # use the new macosx designation.
+            provDarwin = darwinVersionString.match(provided)
+            if provDarwin:
+                dversion = int(provDarwin.group(1))
+                macosversion = "%s.%s" % (reqMac.group(1), reqMac.group(2))
+                if dversion == 7 and macosversion >= "10.3" or \
+                    dversion == 8 and macosversion >= "10.4":
+
+                    #import warnings
+                    #warnings.warn("Mac eggs should be rebuilt to "
+                    #    "use the macosx designation instead of darwin.",
+                    #    category=DeprecationWarning)
+                    return True
+            return False    # egg isn't macosx or legacy darwin
+
+        # are they the same major version and machine type?
+        if provMac.group(1) != reqMac.group(1) or \
+            provMac.group(3) != reqMac.group(3):
+            return False
+
+
+
+        # is the required OS major update >= the provided one?
+        if int(provMac.group(2)) > int(reqMac.group(2)):
+            return False
+
+        return True
+
+    # XXX Linux and other platforms' special cases should go here
+    return False
+
+
+def run_script(dist_spec, script_name):
+    """Locate distribution `dist_spec` and run its `script_name` script"""
+    ns = sys._getframe(1).f_globals
+    name = ns['__name__']
+    ns.clear()
+    ns['__name__'] = name
+    require(dist_spec)[0].run_script(script_name, ns)
+
+run_main = run_script   # backward compatibility
+
+def get_distribution(dist):
+    """Return a current distribution object for a Requirement or string"""
+    if isinstance(dist,basestring): dist = Requirement.parse(dist)
+    if isinstance(dist,Requirement): dist = get_provider(dist)
+    if not isinstance(dist,Distribution):
+        raise TypeError("Expected string, Requirement, or Distribution", dist)
+    return dist
+
+def load_entry_point(dist, group, name):
+    """Return `name` entry point of `group` for `dist` or raise ImportError"""
+    return get_distribution(dist).load_entry_point(group, name)
+
+def get_entry_map(dist, group=None):
+    """Return the entry point map for `group`, or the full entry map"""
+    return get_distribution(dist).get_entry_map(group)
+
+def get_entry_info(dist, group, name):
+    """Return the EntryPoint object for `group`+`name`, or ``None``"""
+    return get_distribution(dist).get_entry_info(group, name)
+
+
+class IMetadataProvider:
+
+    def has_metadata(name):
+        """Does the package's distribution contain the named metadata?"""
+
+    def get_metadata(name):
+        """The named metadata resource as a string"""
+
+    def get_metadata_lines(name):
+        """Yield named metadata resource as list of non-blank non-comment lines
+
+       Leading and trailing whitespace is stripped from each line, and lines
+       with ``#`` as the first non-blank character are omitted."""
+
+    def metadata_isdir(name):
+        """Is the named metadata a directory?  (like ``os.path.isdir()``)"""
+
+    def metadata_listdir(name):
+        """List of metadata names in the directory (like ``os.listdir()``)"""
+
+    def run_script(script_name, namespace):
+        """Execute the named script in the supplied namespace dictionary"""
+
+
+
+
+
+
+
+
+
+
+class IResourceProvider(IMetadataProvider):
+    """An object that provides access to package resources"""
+
+    def get_resource_filename(manager, resource_name):
+        """Return a true filesystem path for `resource_name`
+
+        `manager` must be an ``IResourceManager``"""
+
+    def get_resource_stream(manager, resource_name):
+        """Return a readable file-like object for `resource_name`
+
+        `manager` must be an ``IResourceManager``"""
+
+    def get_resource_string(manager, resource_name):
+        """Return a string containing the contents of `resource_name`
+
+        `manager` must be an ``IResourceManager``"""
+
+    def has_resource(resource_name):
+        """Does the package contain the named resource?"""
+
+    def resource_isdir(resource_name):
+        """Is the named resource a directory?  (like ``os.path.isdir()``)"""
+
+    def resource_listdir(resource_name):
+        """List of resource names in the directory (like ``os.listdir()``)"""
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+class WorkingSet(object):
+    """A collection of active distributions on sys.path (or a similar list)"""
+
+    def __init__(self, entries=None):
+        """Create working set from list of path entries (default=sys.path)"""
+        self.entries = []
+        self.entry_keys = {}
+        self.by_key = {}
+        self.callbacks = []
+
+        if entries is None:
+            entries = sys.path
+
+        for entry in entries:
+            self.add_entry(entry)
+
+
+    def add_entry(self, entry):
+        """Add a path item to ``.entries``, finding any distributions on it
+
+        ``find_distributions(entry,True)`` is used to find distributions
+        corresponding to the path entry, and they are added.  `entry` is
+        always appended to ``.entries``, even if it is already present.
+        (This is because ``sys.path`` can contain the same value more than
+        once, and the ``.entries`` of the ``sys.path`` WorkingSet should always
+        equal ``sys.path``.)
+        """
+        self.entry_keys.setdefault(entry, [])
+        self.entries.append(entry)
+        for dist in find_distributions(entry, True):
+            self.add(dist, entry, False)
+
+
+    def __contains__(self,dist):
+        """True if `dist` is the active distribution for its project"""
+        return self.by_key.get(dist.key) == dist
+
+
+
+
+
+    def find(self, req):
+        """Find a distribution matching requirement `req`
+
+        If there is an active distribution for the requested project, this
+        returns it as long as it meets the version requirement specified by
+        `req`.  But, if there is an active distribution for the project and it
+        does *not* meet the `req` requirement, ``VersionConflict`` is raised.
+        If there is no active distribution for the requested project, ``None``
+        is returned.
+        """
+        dist = self.by_key.get(req.key)
+        if dist is not None and dist not in req:
+            raise VersionConflict(dist,req)     # XXX add more info
+        else:
+            return dist
+
+    def iter_entry_points(self, group, name=None):
+        """Yield entry point objects from `group` matching `name`
+
+        If `name` is None, yields all entry points in `group` from all
+        distributions in the working set, otherwise only ones matching
+        both `group` and `name` are yielded (in distribution order).
+        """
+        for dist in self:
+            entries = dist.get_entry_map(group)
+            if name is None:
+                for ep in entries.values():
+                    yield ep
+            elif name in entries:
+                yield entries[name]
+
+    def run_script(self, requires, script_name):
+        """Locate distribution for `requires` and run `script_name` script"""
+        ns = sys._getframe(1).f_globals
+        name = ns['__name__']
+        ns.clear()
+        ns['__name__'] = name
+        self.require(requires)[0].run_script(script_name, ns)
+
+
+
+    def __iter__(self):
+        """Yield distributions for non-duplicate projects in the working set
+
+        The yield order is the order in which the items' path entries were
+        added to the working set.
+        """
+        seen = {}
+        for item in self.entries:
+            for key in self.entry_keys[item]:
+                if key not in seen:
+                    seen[key]=1
+                    yield self.by_key[key]
+
+    def add(self, dist, entry=None, insert=True):
+        """Add `dist` to working set, associated with `entry`
+
+        If `entry` is unspecified, it defaults to the ``.location`` of `dist`.
+        On exit from this routine, `entry` is added to the end of the working
+        set's ``.entries`` (if it wasn't already present).
+
+        `dist` is only added to the working set if it's for a project that
+        doesn't already have a distribution in the set.  If it's added, any
+        callbacks registered with the ``subscribe()`` method will be called.
+        """
+        if insert:
+            dist.insert_on(self.entries, entry)
+
+        if entry is None:
+            entry = dist.location
+        keys = self.entry_keys.setdefault(entry,[])
+        keys2 = self.entry_keys.setdefault(dist.location,[])
+        if dist.key in self.by_key:
+            return      # ignore hidden distros
+
+        self.by_key[dist.key] = dist
+        if dist.key not in keys:
+            keys.append(dist.key)
+        if dist.key not in keys2:
+            keys2.append(dist.key)
+        self._added_new(dist)
+
+    def resolve(self, requirements, env=None, installer=None, replacement=True):
+        """List all distributions needed to (recursively) meet `requirements`
+
+        `requirements` must be a sequence of ``Requirement`` objects.  `env`,
+        if supplied, should be an ``Environment`` instance.  If
+        not supplied, it defaults to all distributions available within any
+        entry or distribution in the working set.  `installer`, if supplied,
+        will be invoked with each requirement that cannot be met by an
+        already-installed distribution; it should return a ``Distribution`` or
+        ``None``.
+        """
+
+        requirements = list(requirements)[::-1]  # set up the stack
+        processed = {}  # set of processed requirements
+        best = {}  # key -> dist
+        to_activate = []
+
+        while requirements:
+            req = requirements.pop(0)   # process dependencies breadth-first
+            if _override_setuptools(req) and replacement:
+                req = Requirement.parse('distribute')
+
+            if req in processed:
+                # Ignore cyclic or redundant dependencies
+                continue
+            dist = best.get(req.key)
+            if dist is None:
+                # Find the best distribution and add it to the map
+                dist = self.by_key.get(req.key)
+                if dist is None:
+                    if env is None:
+                        env = Environment(self.entries)
+                    dist = best[req.key] = env.best_match(req, self, installer)
+                    if dist is None:
+                        #msg = ("The '%s' distribution was not found on this "
+                        #       "system, and is required by this application.")
+                        #raise DistributionNotFound(msg % req)
+
+                        # unfortunately, zc.buildout uses a str(err)
+                        # to get the name of the distribution here..
+                        raise DistributionNotFound(req)
+                to_activate.append(dist)
+            if dist not in req:
+                # Oops, the "best" so far conflicts with a dependency
+                raise VersionConflict(dist,req) # XXX put more info here
+            requirements.extend(dist.requires(req.extras)[::-1])
+            processed[req] = True
+
+        return to_activate    # return list of distros to activate
+
+    def find_plugins(self,
+        plugin_env, full_env=None, installer=None, fallback=True
+    ):
+        """Find all activatable distributions in `plugin_env`
+
+        Example usage::
+
+            distributions, errors = working_set.find_plugins(
+                Environment(plugin_dirlist)
+            )
+            map(working_set.add, distributions)  # add plugins+libs to sys.path
+            print 'Could not load', errors        # display errors
+
+        The `plugin_env` should be an ``Environment`` instance that contains
+        only distributions that are in the project's "plugin directory" or
+        directories. The `full_env`, if supplied, should be an ``Environment``
+        contains all currently-available distributions.  If `full_env` is not
+        supplied, one is created automatically from the ``WorkingSet`` this
+        method is called on, which will typically mean that every directory on
+        ``sys.path`` will be scanned for distributions.
+
+        `installer` is a standard installer callback as used by the
+        ``resolve()`` method. The `fallback` flag indicates whether we should
+        attempt to resolve older versions of a plugin if the newest version
+        cannot be resolved.
+
+        This method returns a 2-tuple: (`distributions`, `error_info`), where
+        `distributions` is a list of the distributions found in `plugin_env`
+        that were loadable, along with any other distributions that are needed
+        to resolve their dependencies.  `error_info` is a dictionary mapping
+        unloadable plugin distributions to an exception instance describing the
+        error that occurred. Usually this will be a ``DistributionNotFound`` or
+        ``VersionConflict`` instance.
+        """
+
+        plugin_projects = list(plugin_env)
+        plugin_projects.sort()  # scan project names in alphabetic order
+
+        error_info = {}
+        distributions = {}
+
+        if full_env is None:
+            env = Environment(self.entries)
+            env += plugin_env
+        else:
+            env = full_env + plugin_env
+
+        shadow_set = self.__class__([])
+        map(shadow_set.add, self)   # put all our entries in shadow_set
+
+        for project_name in plugin_projects:
+
+            for dist in plugin_env[project_name]:
+
+                req = [dist.as_requirement()]
+
+                try:
+                    resolvees = shadow_set.resolve(req, env, installer)
+
+                except ResolutionError as v:
+                    error_info[dist] = v    # save error info
+                    if fallback:
+                        continue    # try the next older version of project
+                    else:
+                        break       # give up on this project, keep going
+
+                else:
+                    map(shadow_set.add, resolvees)
+                    distributions.update(dict.fromkeys(resolvees))
+
+                    # success, no need to try any more versions of this project
+                    break
+
+        distributions = list(distributions)
+        distributions.sort()
+
+        return distributions, error_info
+
+
+
+
+
+    def require(self, *requirements):
+        """Ensure that distributions matching `requirements` are activated
+
+        `requirements` must be a string or a (possibly-nested) sequence
+        thereof, specifying the distributions and versions required.  The
+        return value is a sequence of the distributions that needed to be
+        activated to fulfill the requirements; all relevant distributions are
+        included, even if they were already activated in this working set.
+        """
+
+        needed = self.resolve(parse_requirements(requirements))
+
+        for dist in needed:
+            self.add(dist)
+
+        return needed
+
+
+    def subscribe(self, callback):
+        """Invoke `callback` for all distributions (including existing ones)"""
+        if callback in self.callbacks:
+            return
+        self.callbacks.append(callback)
+        for dist in self:
+            callback(dist)
+
+
+    def _added_new(self, dist):
+        for callback in self.callbacks:
+            callback(dist)
+
+
+
+
+
+
+
+
+
+
+
+class Environment(object):
+    """Searchable snapshot of distributions on a search path"""
+
+    def __init__(self, search_path=None, platform=get_supported_platform(), python=PY_MAJOR):
+        """Snapshot distributions available on a search path
+
+        Any distributions found on `search_path` are added to the environment.
+        `search_path` should be a sequence of ``sys.path`` items.  If not
+        supplied, ``sys.path`` is used.
+
+        `platform` is an optional string specifying the name of the platform
+        that platform-specific distributions must be compatible with.  If
+        unspecified, it defaults to the current platform.  `python` is an
+        optional string naming the desired version of Python (e.g. ``'2.4'``);
+        it defaults to the current version.
+
+        You may explicitly set `platform` (and/or `python`) to ``None`` if you
+        wish to map *all* distributions, not just those compatible with the
+        running platform or Python version.
+        """
+        self._distmap = {}
+        self._cache = {}
+        self.platform = platform
+        self.python = python
+        self.scan(search_path)
+
+    def can_add(self, dist):
+        """Is distribution `dist` acceptable for this environment?
+
+        The distribution must match the platform and python version
+        requirements specified when this environment was created, or False
+        is returned.
+        """
+        return (self.python is None or dist.py_version is None
+            or dist.py_version==self.python) \
+           and compatible_platforms(dist.platform,self.platform)
+
+    def remove(self, dist):
+        """Remove `dist` from the environment"""
+        self._distmap[dist.key].remove(dist)
+
+    def scan(self, search_path=None):
+        """Scan `search_path` for distributions usable in this environment
+
+        Any distributions found are added to the environment.
+        `search_path` should be a sequence of ``sys.path`` items.  If not
+        supplied, ``sys.path`` is used.  Only distributions conforming to
+        the platform/python version defined at initialization are added.
+        """
+        if search_path is None:
+            search_path = sys.path
+
+        for item in search_path:
+            for dist in find_distributions(item):
+                self.add(dist)
+
+    def __getitem__(self,project_name):
+        """Return a newest-to-oldest list of distributions for `project_name`
+        """
+        try:
+            return self._cache[project_name]
+        except KeyError:
+            project_name = project_name.lower()
+            if project_name not in self._distmap:
+                return []
+
+        if project_name not in self._cache:
+            dists = self._cache[project_name] = self._distmap[project_name]
+            _sort_dists(dists)
+
+        return self._cache[project_name]
+
+    def add(self,dist):
+        """Add `dist` if we ``can_add()`` it and it isn't already added"""
+        if self.can_add(dist) and dist.has_version():
+            dists = self._distmap.setdefault(dist.key,[])
+            if dist not in dists:
+                dists.append(dist)
+                if dist.key in self._cache:
+                    _sort_dists(self._cache[dist.key])
+
+
+    def best_match(self, req, working_set, installer=None):
+        """Find distribution best matching `req` and usable on `working_set`
+
+        This calls the ``find(req)`` method of the `working_set` to see if a
+        suitable distribution is already active.  (This may raise
+        ``VersionConflict`` if an unsuitable version of the project is already
+        active in the specified `working_set`.)  If a suitable distribution
+        isn't active, this method returns the newest distribution in the
+        environment that meets the ``Requirement`` in `req`.  If no suitable
+        distribution is found, and `installer` is supplied, then the result of
+        calling the environment's ``obtain(req, installer)`` method will be
+        returned.
+        """
+        dist = working_set.find(req)
+        if dist is not None:
+            return dist
+        for dist in self[req.key]:
+            if dist in req:
+                return dist
+        return self.obtain(req, installer) # try and download/install
+
+    def obtain(self, requirement, installer=None):
+        """Obtain a distribution matching `requirement` (e.g. via download)
+
+        Obtain a distro that matches requirement (e.g. via download).  In the
+        base ``Environment`` class, this routine just returns
+        ``installer(requirement)``, unless `installer` is None, in which case
+        None is returned instead.  This method is a hook that allows subclasses
+        to attempt other ways of obtaining a distribution before falling back
+        to the `installer` argument."""
+        if installer is not None:
+            return installer(requirement)
+
+    def __iter__(self):
+        """Yield the unique project names of the available distributions"""
+        for key in self._distmap.keys():
+            if self[key]: yield key
+
+
+
+
+    def __iadd__(self, other):
+        """In-place addition of a distribution or environment"""
+        if isinstance(other,Distribution):
+            self.add(other)
+        elif isinstance(other,Environment):
+            for project in other:
+                for dist in other[project]:
+                    self.add(dist)
+        else:
+            raise TypeError("Can't add %r to environment" % (other,))
+        return self
+
+    def __add__(self, other):
+        """Add an environment or distribution to an environment"""
+        new = self.__class__([], platform=None, python=None)
+        for env in self, other:
+            new += env
+        return new
+
+
+AvailableDistributions = Environment    # XXX backward compatibility
+
+
+class ExtractionError(RuntimeError):
+    """An error occurred extracting a resource
+
+    The following attributes are available from instances of this exception:
+
+    manager
+        The resource manager that raised this exception
+
+    cache_path
+        The base directory for resource extraction
+
+    original_error
+        The exception instance that caused extraction to fail
+    """
+
+
+
+
+class ResourceManager:
+    """Manage resource extraction and packages"""
+    extraction_path = None
+
+    def __init__(self):
+        self.cached_files = {}
+
+    def resource_exists(self, package_or_requirement, resource_name):
+        """Does the named resource exist?"""
+        return get_provider(package_or_requirement).has_resource(resource_name)
+
+    def resource_isdir(self, package_or_requirement, resource_name):
+        """Is the named resource an existing directory?"""
+        return get_provider(package_or_requirement).resource_isdir(
+            resource_name
+        )
+
+    def resource_filename(self, package_or_requirement, resource_name):
+        """Return a true filesystem path for specified resource"""
+        return get_provider(package_or_requirement).get_resource_filename(
+            self, resource_name
+        )
+
+    def resource_stream(self, package_or_requirement, resource_name):
+        """Return a readable file-like object for specified resource"""
+        return get_provider(package_or_requirement).get_resource_stream(
+            self, resource_name
+        )
+
+    def resource_string(self, package_or_requirement, resource_name):
+        """Return specified resource as a string"""
+        return get_provider(package_or_requirement).get_resource_string(
+            self, resource_name
+        )
+
+    def resource_listdir(self, package_or_requirement, resource_name):
+        """List the contents of the named resource directory"""
+        return get_provider(package_or_requirement).resource_listdir(
+            resource_name
+        )
+
+    def extraction_error(self):
+        """Give an error message for problems extracting file(s)"""
+
+        old_exc = sys.exc_info()[1]
+        cache_path = self.extraction_path or get_default_cache()
+
+        err = ExtractionError("""Can't extract file(s) to egg cache
+
+The following error occurred while trying to extract file(s) to the Python egg
+cache:
+
+  %s
+
+The Python egg cache directory is currently set to:
+
+  %s
+
+Perhaps your account does not have write access to this directory?  You can
+change the cache directory by setting the PYTHON_EGG_CACHE environment
+variable to point to an accessible directory.
+"""         % (old_exc, cache_path)
+        )
+        err.manager        = self
+        err.cache_path     = cache_path
+        err.original_error = old_exc
+        raise err
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+    def get_cache_path(self, archive_name, names=()):
+        """Return absolute location in cache for `archive_name` and `names`
+
+        The parent directory of the resulting path will be created if it does
+        not already exist.  `archive_name` should be the base filename of the
+        enclosing egg (which may not be the name of the enclosing zipfile!),
+        including its ".egg" extension.  `names`, if provided, should be a
+        sequence of path name parts "under" the egg's extraction location.
+
+        This method should only be called by resource providers that need to
+        obtain an extraction location, and only for names they intend to
+        extract, as it tracks the generated names for possible cleanup later.
+        """
+        extract_path = self.extraction_path or get_default_cache()
+        target_path = os.path.join(extract_path, archive_name+'-tmp', *names)
+        try:
+            _bypass_ensure_directory(target_path)
+        except:
+            self.extraction_error()
+
+        self.cached_files[target_path] = 1
+        return target_path
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+    def postprocess(self, tempname, filename):
+        """Perform any platform-specific postprocessing of `tempname`
+
+        This is where Mac header rewrites should be done; other platforms don't
+        have anything special they should do.
+
+        Resource providers should call this method ONLY after successfully
+        extracting a compressed resource.  They must NOT call it on resources
+        that are already in the filesystem.
+
+        `tempname` is the current (temporary) name of the file, and `filename`
+        is the name it will be renamed to by the caller after this routine
+        returns.
+        """
+
+        if os.name == 'posix':
+            # Make the resource executable
+            mode = ((os.stat(tempname).st_mode) | 0555) & 07777
+            os.chmod(tempname, mode)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+    def set_extraction_path(self, path):
+        """Set the base path where resources will be extracted to, if needed.
+
+        If you do not call this routine before any extractions take place, the
+        path defaults to the return value of ``get_default_cache()``.  (Which
+        is based on the ``PYTHON_EGG_CACHE`` environment variable, with various
+        platform-specific fallbacks.  See that routine's documentation for more
+        details.)
+
+        Resources are extracted to subdirectories of this path based upon
+        information given by the ``IResourceProvider``.  You may set this to a
+        temporary directory, but then you must call ``cleanup_resources()`` to
+        delete the extracted files when done.  There is no guarantee that
+        ``cleanup_resources()`` will be able to remove all extracted files.
+
+        (Note: you may not change the extraction path for a given resource
+        manager once resources have been extracted, unless you first call
+        ``cleanup_resources()``.)
+        """
+        if self.cached_files:
+            raise ValueError(
+                "Can't change extraction path, files already extracted"
+            )
+
+        self.extraction_path = path
+
+    def cleanup_resources(self, force=False):
+        """
+        Delete all extracted resource files and directories, returning a list
+        of the file and directory names that could not be successfully removed.
+        This function does not have any concurrency protection, so it should
+        generally only be called when the extraction path is a temporary
+        directory exclusive to a single process.  This method is not
+        automatically called; you must call it explicitly or register it as an
+        ``atexit`` function if you wish to ensure cleanup of a temporary
+        directory used for extractions.
+        """
+        # XXX
+
+
+
+def get_default_cache():
+    """Determine the default cache location
+
+    This returns the ``PYTHON_EGG_CACHE`` environment variable, if set.
+    Otherwise, on Windows, it returns a "Python-Eggs" subdirectory of the
+    "Application Data" directory.  On all other systems, it's "~/.python-eggs".
+    """
+    try:
+        return os.environ['PYTHON_EGG_CACHE']
+    except KeyError:
+        pass
+
+    if os.name!='nt':
+        return os.path.expanduser('~/.python-eggs')
+
+    app_data = 'Application Data'   # XXX this may be locale-specific!
+    app_homes = [
+        (('APPDATA',), None),       # best option, should be locale-safe
+        (('USERPROFILE',), app_data),
+        (('HOMEDRIVE','HOMEPATH'), app_data),
+        (('HOMEPATH',), app_data),
+        (('HOME',), None),
+        (('WINDIR',), app_data),    # 95/98/ME
+    ]
+
+    for keys, subdir in app_homes:
+        dirname = ''
+        for key in keys:
+            if key in os.environ:
+                dirname = os.path.join(dirname, os.environ[key])
+            else:
+                break
+        else:
+            if subdir:
+                dirname = os.path.join(dirname,subdir)
+            return os.path.join(dirname, 'Python-Eggs')
+    else:
+        raise RuntimeError(
+            "Please set the PYTHON_EGG_CACHE enviroment variable"
+        )
+
+def safe_name(name):
+    """Convert an arbitrary string to a standard distribution name
+
+    Any runs of non-alphanumeric/. characters are replaced with a single '-'.
+    """
+    return re.sub('[^A-Za-z0-9.]+', '-', name)
+
+
+def safe_version(version):
+    """Convert an arbitrary string to a standard version string
+
+    Spaces become dots, and all other non-alphanumeric characters become
+    dashes, with runs of multiple dashes condensed to a single dash.
+    """
+    version = version.replace(' ','.')
+    return re.sub('[^A-Za-z0-9.]+', '-', version)
+
+
+def safe_extra(extra):
+    """Convert an arbitrary string to a standard 'extra' name
+
+    Any runs of non-alphanumeric characters are replaced with a single '_',
+    and the result is always lowercased.
+    """
+    return re.sub('[^A-Za-z0-9.]+', '_', extra).lower()
+
+
+def to_filename(name):
+    """Convert a project or version name to its filename-escaped form
+
+    Any '-' characters are currently replaced with '_'.
+    """
+    return name.replace('-','_')
+
+
+
+
+
+
+
+
+class NullProvider:
+    """Try to implement resources and metadata for arbitrary PEP 302 loaders"""
+
+    egg_name = None
+    egg_info = None
+    loader = None
+
+    def __init__(self, module):
+        self.loader = getattr(module, '__loader__', None)
+        self.module_path = os.path.dirname(getattr(module, '__file__', ''))
+
+    def get_resource_filename(self, manager, resource_name):
+        return self._fn(self.module_path, resource_name)
+
+    def get_resource_stream(self, manager, resource_name):
+        return StringIO(self.get_resource_string(manager, resource_name))
+
+    def get_resource_string(self, manager, resource_name):
+        return self._get(self._fn(self.module_path, resource_name))
+
+    def has_resource(self, resource_name):
+        return self._has(self._fn(self.module_path, resource_name))
+
+    def has_metadata(self, name):
+        return self.egg_info and self._has(self._fn(self.egg_info,name))
+
+    if sys.version_info <= (3,):
+        def get_metadata(self, name):
+            if not self.egg_info:
+                return ""
+            return self._get(self._fn(self.egg_info,name))
+    else:
+        def get_metadata(self, name):
+            if not self.egg_info:
+                return ""
+            return self._get(self._fn(self.egg_info,name)).decode("utf-8")
+
+    def get_metadata_lines(self, name):
+        return yield_lines(self.get_metadata(name))
+
+    def resource_isdir(self,resource_name):
+        return self._isdir(self._fn(self.module_path, resource_name))
+
+    def metadata_isdir(self,name):
+        return self.egg_info and self._isdir(self._fn(self.egg_info,name))
+
+
+    def resource_listdir(self,resource_name):
+        return self._listdir(self._fn(self.module_path,resource_name))
+
+    def metadata_listdir(self,name):
+        if self.egg_info:
+            return self._listdir(self._fn(self.egg_info,name))
+        return []
+
+    def run_script(self,script_name,namespace):
+        script = 'scripts/'+script_name
+        if not self.has_metadata(script):
+            raise ResolutionError("No script named %r" % script_name)
+        script_text = self.get_metadata(script).replace('\r\n','\n')
+        script_text = script_text.replace('\r','\n')
+        script_filename = self._fn(self.egg_info,script)
+        namespace['__file__'] = script_filename
+        if os.path.exists(script_filename):
+            execfile(script_filename, namespace, namespace)
+        else:
+            from linecache import cache
+            cache[script_filename] = (
+                len(script_text), 0, script_text.split('\n'), script_filename
+            )
+            script_code = compile(script_text,script_filename,'exec')
+            exec script_code in namespace, namespace
+
+    def _has(self, path):
+        raise NotImplementedError(
+            "Can't perform this operation for unregistered loader type"
+        )
+
+    def _isdir(self, path):
+        raise NotImplementedError(
+            "Can't perform this operation for unregistered loader type"
+        )
+
+    def _listdir(self, path):
+        raise NotImplementedError(
+            "Can't perform this operation for unregistered loader type"
+        )
+
+    def _fn(self, base, resource_name):
+        if resource_name:
+            return os.path.join(base, *resource_name.split('/'))
+        return base
+
+    def _get(self, path):
+        if hasattr(self.loader, 'get_data'):
+            return self.loader.get_data(path)
+        raise NotImplementedError(
+            "Can't perform this operation for loaders without 'get_data()'"
+        )
+
+register_loader_type(object, NullProvider)
+
+
+class EggProvider(NullProvider):
+    """Provider based on a virtual filesystem"""
+
+    def __init__(self,module):
+        NullProvider.__init__(self,module)
+        self._setup_prefix()
+
+    def _setup_prefix(self):
+        # we assume here that our metadata may be nested inside a "basket"
+        # of multiple eggs; that's why we use module_path instead of .archive
+        path = self.module_path
+        old = None
+        while path!=old:
+            if path.lower().endswith('.egg'):
+                self.egg_name = os.path.basename(path)
+                self.egg_info = os.path.join(path, 'EGG-INFO')
+                self.egg_root = path
+                break
+            old = path
+            path, base = os.path.split(path)
+
+
+
+
+
+
+class DefaultProvider(EggProvider):
+    """Provides access to package resources in the filesystem"""
+
+    def _has(self, path):
+        return os.path.exists(path)
+
+    def _isdir(self,path):
+        return os.path.isdir(path)
+
+    def _listdir(self,path):
+        return os.listdir(path)
+
+    def get_resource_stream(self, manager, resource_name):
+        return open(self._fn(self.module_path, resource_name), 'rb')
+
+    def _get(self, path):
+        stream = open(path, 'rb')
+        try:
+            return stream.read()
+        finally:
+            stream.close()
+
+register_loader_type(type(None), DefaultProvider)
+
+
+class EmptyProvider(NullProvider):
+    """Provider that returns nothing for all requests"""
+
+    _isdir = _has = lambda self,path: False
+    _get          = lambda self,path: ''
+    _listdir      = lambda self,path: []
+    module_path   = None
+
+    def __init__(self):
+        pass
+
+empty_provider = EmptyProvider()
+
+
+
+
+class ZipProvider(EggProvider):
+    """Resource support for zips and eggs"""
+
+    eagers = None
+
+    def __init__(self, module):
+        EggProvider.__init__(self,module)
+        self.zipinfo = zipimport._zip_directory_cache[self.loader.archive]
+        self.zip_pre = self.loader.archive+os.sep
+
+    def _zipinfo_name(self, fspath):
+        # Convert a virtual filename (full path to file) into a zipfile subpath
+        # usable with the zipimport directory cache for our target archive
+        if fspath.startswith(self.zip_pre):
+            return fspath[len(self.zip_pre):]
+        raise AssertionError(
+            "%s is not a subpath of %s" % (fspath,self.zip_pre)
+        )
+
+    def _parts(self,zip_path):
+        # Convert a zipfile subpath into an egg-relative path part list
+        fspath = self.zip_pre+zip_path  # pseudo-fs path
+        if fspath.startswith(self.egg_root+os.sep):
+            return fspath[len(self.egg_root)+1:].split(os.sep)
+        raise AssertionError(
+            "%s is not a subpath of %s" % (fspath,self.egg_root)
+        )
+
+    def get_resource_filename(self, manager, resource_name):
+        if not self.egg_name:
+            raise NotImplementedError(
+                "resource_filename() only supported for .egg, not .zip"
+            )
+        # no need to lock for extraction, since we use temp names
+        zip_path = self._resource_to_zip(resource_name)
+        eagers = self._get_eager_resources()
+        if '/'.join(self._parts(zip_path)) in eagers:
+            for name in eagers:
+                self._extract_resource(manager, self._eager_to_zip(name))
+        return self._extract_resource(manager, zip_path)
+
+    def _extract_resource(self, manager, zip_path):
+
+        if zip_path in self._index():
+            for name in self._index()[zip_path]:
+                last = self._extract_resource(
+                    manager, os.path.join(zip_path, name)
+                )
+            return os.path.dirname(last)  # return the extracted directory name
+
+        zip_stat = self.zipinfo[zip_path]
+        t,d,size = zip_stat[5], zip_stat[6], zip_stat[3]
+        date_time = (
+            (d>>9)+1980, (d>>5)&0xF, d&0x1F,                      # ymd
+            (t&0xFFFF)>>11, (t>>5)&0x3F, (t&0x1F) * 2, 0, 0, -1   # hms, etc.
+        )
+        timestamp = time.mktime(date_time)
+
+        try:
+            if not WRITE_SUPPORT:
+                raise IOError('"os.rename" and "os.unlink" are not supported '
+                              'on this platform')
+
+            real_path = manager.get_cache_path(
+                self.egg_name, self._parts(zip_path)
+            )
+
+            if os.path.isfile(real_path):
+                stat = os.stat(real_path)
+                if stat.st_size==size and stat.st_mtime==timestamp:
+                    # size and stamp match, don't bother extracting
+                    return real_path
+
+            outf, tmpnam = _mkstemp(".$extract", dir=os.path.dirname(real_path))
+            os.write(outf, self.loader.get_data(zip_path))
+            os.close(outf)
+            utime(tmpnam, (timestamp,timestamp))
+            manager.postprocess(tmpnam, real_path)
+
+            try:
+                rename(tmpnam, real_path)
+
+            except os.error:
+                if os.path.isfile(real_path):
+                    stat = os.stat(real_path)
+
+                    if stat.st_size==size and stat.st_mtime==timestamp:
+                        # size and stamp match, somebody did it just ahead of
+                        # us, so we're done
+                        return real_path
+                    elif os.name=='nt':     # Windows, del old file and retry
+                        unlink(real_path)
+                        rename(tmpnam, real_path)
+                        return real_path
+                raise
+
+        except os.error:
+            manager.extraction_error()  # report a user-friendly error
+
+        return real_path
+
+    def _get_eager_resources(self):
+        if self.eagers is None:
+            eagers = []
+            for name in ('native_libs.txt', 'eager_resources.txt'):
+                if self.has_metadata(name):
+                    eagers.extend(self.get_metadata_lines(name))
+            self.eagers = eagers
+        return self.eagers
+
+    def _index(self):
+        try:
+            return self._dirindex
+        except AttributeError:
+            ind = {}
+            for path in self.zipinfo:
+                parts = path.split(os.sep)
+                while parts:
+                    parent = os.sep.join(parts[:-1])
+                    if parent in ind:
+                        ind[parent].append(parts[-1])
+                        break
+                    else:
+                        ind[parent] = [parts.pop()]
+            self._dirindex = ind
+            return ind
+
+    def _has(self, fspath):
+        zip_path = self._zipinfo_name(fspath)
+        return zip_path in self.zipinfo or zip_path in self._index()
+
+    def _isdir(self,fspath):
+        return self._zipinfo_name(fspath) in self._index()
+
+    def _listdir(self,fspath):
+        return list(self._index().get(self._zipinfo_name(fspath), ()))
+
+    def _eager_to_zip(self,resource_name):
+        return self._zipinfo_name(self._fn(self.egg_root,resource_name))
+
+    def _resource_to_zip(self,resource_name):
+        return self._zipinfo_name(self._fn(self.module_path,resource_name))
+
+register_loader_type(zipimport.zipimporter, ZipProvider)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+class FileMetadata(EmptyProvider):
+    """Metadata handler for standalone PKG-INFO files
+
+    Usage::
+
+        metadata = FileMetadata("/path/to/PKG-INFO")
+
+    This provider rejects all data and metadata requests except for PKG-INFO,
+    which is treated as existing, and will be the contents of the file at
+    the provided location.
+    """
+
+    def __init__(self,path):
+        self.path = path
+
+    def has_metadata(self,name):
+        return name=='PKG-INFO'
+
+    def get_metadata(self,name):
+        if name=='PKG-INFO':
+            f = open(self.path,'rU')
+            metadata = f.read()
+            f.close()
+            return metadata
+        raise KeyError("No metadata except PKG-INFO is available")
+
+    def get_metadata_lines(self,name):
+        return yield_lines(self.get_metadata(name))
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+class PathMetadata(DefaultProvider):
+    """Metadata provider for egg directories
+
+    Usage::
+
+        # Development eggs:
+
+        egg_info = "/path/to/PackageName.egg-info"
+        base_dir = os.path.dirname(egg_info)
+        metadata = PathMetadata(base_dir, egg_info)
+        dist_name = os.path.splitext(os.path.basename(egg_info))[0]
+        dist = Distribution(basedir,project_name=dist_name,metadata=metadata)
+
+        # Unpacked egg directories:
+
+        egg_path = "/path/to/PackageName-ver-pyver-etc.egg"
+        metadata = PathMetadata(egg_path, os.path.join(egg_path,'EGG-INFO'))
+        dist = Distribution.from_filename(egg_path, metadata=metadata)
+    """
+
+    def __init__(self, path, egg_info):
+        self.module_path = path
+        self.egg_info = egg_info
+
+
+class EggMetadata(ZipProvider):
+    """Metadata provider for .egg files"""
+
+    def __init__(self, importer):
+        """Create a metadata provider from a zipimporter"""
+
+        self.zipinfo = zipimport._zip_directory_cache[importer.archive]
+        self.zip_pre = importer.archive+os.sep
+        self.loader = importer
+        if importer.prefix:
+            self.module_path = os.path.join(importer.archive, importer.prefix)
+        else:
+            self.module_path = importer.archive
+        self._setup_prefix()
+
+
+class ImpWrapper:
+    """PEP 302 Importer that wraps Python's "normal" import algorithm"""
+
+    def __init__(self, path=None):
+        self.path = path
+
+    def find_module(self, fullname, path=None):
+        subname = fullname.split(".")[-1]
+        if subname != fullname and self.path is None:
+            return None
+        if self.path is None:
+            path = None
+        else:
+            path = [self.path]
+        try:
+            file, filename, etc = imp.find_module(subname, path)
+        except ImportError:
+            return None
+        return ImpLoader(file, filename, etc)
+
+
+class ImpLoader:
+    """PEP 302 Loader that wraps Python's "normal" import algorithm"""
+
+    def __init__(self, file, filename, etc):
+        self.file = file
+        self.filename = filename
+        self.etc = etc
+
+    def load_module(self, fullname):
+        try:
+            mod = imp.load_module(fullname, self.file, self.filename, self.etc)
+        finally:
+            if self.file: self.file.close()
+        # Note: we don't set __loader__ because we want the module to look
+        # normal; i.e. this is just a wrapper for standard import machinery
+        return mod
+
+
+
+
+def get_importer(path_item):
+    """Retrieve a PEP 302 "importer" for the given path item
+
+    If there is no importer, this returns a wrapper around the builtin import
+    machinery.  The returned importer is only cached if it was created by a
+    path hook.
+    """
+    try:
+        importer = sys.path_importer_cache[path_item]
+    except KeyError:
+        for hook in sys.path_hooks:
+            try:
+                importer = hook(path_item)
+            except ImportError:
+                pass
+            else:
+                break
+        else:
+            importer = None
+
+    sys.path_importer_cache.setdefault(path_item,importer)
+    if importer is None:
+        try:
+            importer = ImpWrapper(path_item)
+        except ImportError:
+            pass
+    return importer
+
+try:
+    from pkgutil import get_importer, ImpImporter
+except ImportError:
+    pass    # Python 2.3 or 2.4, use our own implementation
+else:
+    ImpWrapper = ImpImporter    # Python 2.5, use pkgutil's implementation
+    del ImpLoader, ImpImporter
+
+
+
+
+
+
+_distribution_finders = {}
+
+def register_finder(importer_type, distribution_finder):
+    """Register `distribution_finder` to find distributions in sys.path items
+
+    `importer_type` is the type or class of a PEP 302 "Importer" (sys.path item
+    handler), and `distribution_finder` is a callable that, passed a path
+    item and the importer instance, yields ``Distribution`` instances found on
+    that path item.  See ``pkg_resources.find_on_path`` for an example."""
+    _distribution_finders[importer_type] = distribution_finder
+
+
+def find_distributions(path_item, only=False):
+    """Yield distributions accessible via `path_item`"""
+    importer = get_importer(path_item)
+    finder = _find_adapter(_distribution_finders, importer)
+    return finder(importer, path_item, only)
+
+def find_in_zip(importer, path_item, only=False):
+    metadata = EggMetadata(importer)
+    if metadata.has_metadata('PKG-INFO'):
+        yield Distribution.from_filename(path_item, metadata=metadata)
+    if only:
+        return  # don't yield nested distros
+    for subitem in metadata.resource_listdir('/'):
+        if subitem.endswith('.egg'):
+            subpath = os.path.join(path_item, subitem)
+            for dist in find_in_zip(zipimport.zipimporter(subpath), subpath):
+                yield dist
+
+register_finder(zipimport.zipimporter, find_in_zip)
+
+def StringIO(*args, **kw):
+    """Thunk to load the real StringIO on demand"""
+    global StringIO
+    try:
+        from cStringIO import StringIO
+    except ImportError:
+        from StringIO import StringIO
+    return StringIO(*args,**kw)
+
+def find_nothing(importer, path_item, only=False):
+    return ()
+register_finder(object,find_nothing)
+
+def find_on_path(importer, path_item, only=False):
+    """Yield distributions accessible on a sys.path directory"""
+    path_item = _normalize_cached(path_item)
+
+    if os.path.isdir(path_item) and os.access(path_item, os.R_OK):
+        if path_item.lower().endswith('.egg'):
+            # unpacked egg
+            yield Distribution.from_filename(
+                path_item, metadata=PathMetadata(
+                    path_item, os.path.join(path_item,'EGG-INFO')
+                )
+            )
+        else:
+            # scan for .egg and .egg-info in directory
+            for entry in os.listdir(path_item):
+                lower = entry.lower()
+                if lower.endswith('.egg-info'):
+                    fullpath = os.path.join(path_item, entry)
+                    if os.path.isdir(fullpath):
+                        # egg-info directory, allow getting metadata
+                        metadata = PathMetadata(path_item, fullpath)
+                    else:
+                        metadata = FileMetadata(fullpath)
+                    yield Distribution.from_location(
+                        path_item,entry,metadata,precedence=DEVELOP_DIST
+                    )
+                elif not only and lower.endswith('.egg'):
+                    for dist in find_distributions(os.path.join(path_item, entry)):
+                        yield dist
+                elif not only and lower.endswith('.egg-link'):
+                    for line in open(os.path.join(path_item, entry)):
+                        if not line.strip(): continue
+                        for item in find_distributions(os.path.join(path_item,line.rstrip())):
+                            yield item
+                        break
+register_finder(ImpWrapper,find_on_path)
+
+_namespace_handlers = {}
+_namespace_packages = {}
+
+def register_namespace_handler(importer_type, namespace_handler):
+    """Register `namespace_handler` to declare namespace packages
+
+    `importer_type` is the type or class of a PEP 302 "Importer" (sys.path item
+    handler), and `namespace_handler` is a callable like this::
+
+        def namespace_handler(importer,path_entry,moduleName,module):
+            # return a path_entry to use for child packages
+
+    Namespace handlers are only called if the importer object has already
+    agreed that it can handle the relevant path item, and they should only
+    return a subpath if the module __path__ does not already contain an
+    equivalent subpath.  For an example namespace handler, see
+    ``pkg_resources.file_ns_handler``.
+    """
+    _namespace_handlers[importer_type] = namespace_handler
+
+def _handle_ns(packageName, path_item):
+    """Ensure that named package includes a subpath of path_item (if needed)"""
+    importer = get_importer(path_item)
+    if importer is None:
+        return None
+    loader = importer.find_module(packageName)
+    if loader is None:
+        return None
+    module = sys.modules.get(packageName)
+    if module is None:
+        module = sys.modules[packageName] = types.ModuleType(packageName)
+        module.__path__ = []; _set_parent_ns(packageName)
+    elif not hasattr(module,'__path__'):
+        raise TypeError("Not a package:", packageName)
+    handler = _find_adapter(_namespace_handlers, importer)
+    subpath = handler(importer,path_item,packageName,module)
+    if subpath is not None:
+        path = module.__path__; path.append(subpath)
+        loader.load_module(packageName); module.__path__ = path
+    return subpath
+
+def declare_namespace(packageName):
+    """Declare that package 'packageName' is a namespace package"""
+
+    imp.acquire_lock()
+    try:
+        if packageName in _namespace_packages:
+            return
+
+        path, parent = sys.path, None
+        if '.' in packageName:
+            parent = '.'.join(packageName.split('.')[:-1])
+            declare_namespace(parent)
+            __import__(parent)
+            try:
+                path = sys.modules[parent].__path__
+            except AttributeError:
+                raise TypeError("Not a package:", parent)
+
+        # Track what packages are namespaces, so when new path items are added,
+        # they can be updated
+        _namespace_packages.setdefault(parent,[]).append(packageName)
+        _namespace_packages.setdefault(packageName,[])
+
+        for path_item in path:
+            # Ensure all the parent's path items are reflected in the child,
+            # if they apply
+            _handle_ns(packageName, path_item)
+
+    finally:
+        imp.release_lock()
+
+def fixup_namespace_packages(path_item, parent=None):
+    """Ensure that previously-declared namespace packages include path_item"""
+    imp.acquire_lock()
+    try:
+        for package in _namespace_packages.get(parent,()):
+            subpath = _handle_ns(package, path_item)
+            if subpath: fixup_namespace_packages(subpath,package)
+    finally:
+        imp.release_lock()
+
+def file_ns_handler(importer, path_item, packageName, module):
+    """Compute an ns-package subpath for a filesystem or zipfile importer"""
+
+    subpath = os.path.join(path_item, packageName.split('.')[-1])
+    normalized = _normalize_cached(subpath)
+    for item in module.__path__:
+        if _normalize_cached(item)==normalized:
+            break
+    else:
+        # Only return the path if it's not already there
+        return subpath
+
+register_namespace_handler(ImpWrapper,file_ns_handler)
+register_namespace_handler(zipimport.zipimporter,file_ns_handler)
+
+
+def null_ns_handler(importer, path_item, packageName, module):
+    return None
+
+register_namespace_handler(object,null_ns_handler)
+
+
+def normalize_path(filename):
+    """Normalize a file/dir name for comparison purposes"""
+    return os.path.normcase(os.path.realpath(filename))
+
+def _normalize_cached(filename,_cache={}):
+    try:
+        return _cache[filename]
+    except KeyError:
+        _cache[filename] = result = normalize_path(filename)
+        return result
+
+def _set_parent_ns(packageName):
+    parts = packageName.split('.')
+    name = parts.pop()
+    if parts:
+        parent = '.'.join(parts)
+        setattr(sys.modules[parent], name, sys.modules[packageName])
+
+
+def yield_lines(strs):
+    """Yield non-empty/non-comment lines of a ``basestring`` or sequence"""
+    if isinstance(strs,basestring):
+        for s in strs.splitlines():
+            s = s.strip()
+            if s and not s.startswith('#'):     # skip blank lines/comments
+                yield s
+    else:
+        for ss in strs:
+            for s in yield_lines(ss):
+                yield s
+
+LINE_END = re.compile(r"\s*(#.*)?$").match         # whitespace and comment
+CONTINUE = re.compile(r"\s*\\\s*(#.*)?$").match    # line continuation
+DISTRO   = re.compile(r"\s*((\w|[-.])+)").match    # Distribution or extra
+VERSION  = re.compile(r"\s*(<=?|>=?|==|!=)\s*((\w|[-.])+)").match  # ver. info
+COMMA    = re.compile(r"\s*,").match               # comma between items
+OBRACKET = re.compile(r"\s*\[").match
+CBRACKET = re.compile(r"\s*\]").match
+MODULE   = re.compile(r"\w+(\.\w+)*$").match
+EGG_NAME = re.compile(
+    r"(?P<name>[^-]+)"
+    r"( -(?P<ver>[^-]+) (-py(?P<pyver>[^-]+) (-(?P<plat>.+))? )? )?",
+    re.VERBOSE | re.IGNORECASE
+).match
+
+component_re = re.compile(r'(\d+ | [a-z]+ | \.| -)', re.VERBOSE)
+replace = {'pre':'c', 'preview':'c','-':'final-','rc':'c','dev':'@'}.get
+
+def _parse_version_parts(s):
+    for part in component_re.split(s):
+        part = replace(part,part)
+        if not part or part=='.':
+            continue
+        if part[:1] in '0123456789':
+            yield part.zfill(8)    # pad for numeric comparison
+        else:
+            yield '*'+part
+
+    yield '*final'  # ensure that alpha/beta/candidate are before final
+
+def parse_version(s):
+    """Convert a version string to a chronologically-sortable key
+
+    This is a rough cross between distutils' StrictVersion and LooseVersion;
+    if you give it versions that would work with StrictVersion, then it behaves
+    the same; otherwise it acts like a slightly-smarter LooseVersion. It is
+    *possible* to create pathological version coding schemes that will fool
+    this parser, but they should be very rare in practice.
+
+    The returned value will be a tuple of strings.  Numeric portions of the
+    version are padded to 8 digits so they will compare numerically, but
+    without relying on how numbers compare relative to strings.  Dots are
+    dropped, but dashes are retained.  Trailing zeros between alpha segments
+    or dashes are suppressed, so that e.g. "2.4.0" is considered the same as
+    "2.4". Alphanumeric parts are lower-cased.
+
+    The algorithm assumes that strings like "-" and any alpha string that
+    alphabetically follows "final"  represents a "patch level".  So, "2.4-1"
+    is assumed to be a branch or patch of "2.4", and therefore "2.4.1" is
+    considered newer than "2.4-1", which in turn is newer than "2.4".
+
+    Strings like "a", "b", "c", "alpha", "beta", "candidate" and so on (that
+    come before "final" alphabetically) are assumed to be pre-release versions,
+    so that the version "2.4" is considered newer than "2.4a1".
+
+    Finally, to handle miscellaneous cases, the strings "pre", "preview", and
+    "rc" are treated as if they were "c", i.e. as though they were release
+    candidates, and therefore are not as new as a version string that does not
+    contain them, and "dev" is replaced with an '@' so that it sorts lower than
+    than any other pre-release tag.
+    """
+    parts = []
+    for part in _parse_version_parts(s.lower()):
+        if part.startswith('*'):
+            if part<'*final':   # remove '-' before a prerelease tag
+                while parts and parts[-1]=='*final-': parts.pop()
+            # remove trailing zeros from each series of numeric parts
+            while parts and parts[-1]=='00000000':
+                parts.pop()
+        parts.append(part)
+    return tuple(parts)
+
+class EntryPoint(object):
+    """Object representing an advertised importable object"""
+
+    def __init__(self, name, module_name, attrs=(), extras=(), dist=None):
+        if not MODULE(module_name):
+            raise ValueError("Invalid module name", module_name)
+        self.name = name
+        self.module_name = module_name
+        self.attrs = tuple(attrs)
+        self.extras = Requirement.parse(("x[%s]" % ','.join(extras))).extras
+        self.dist = dist
+
+    def __str__(self):
+        s = "%s = %s" % (self.name, self.module_name)
+        if self.attrs:
+            s += ':' + '.'.join(self.attrs)
+        if self.extras:
+            s += ' [%s]' % ','.join(self.extras)
+        return s
+
+    def __repr__(self):
+        return "EntryPoint.parse(%r)" % str(self)
+
+    def load(self, require=True, env=None, installer=None):
+        if require: self.require(env, installer)
+        entry = __import__(self.module_name, globals(),globals(), ['__name__'])
+        for attr in self.attrs:
+            try:
+                entry = getattr(entry,attr)
+            except AttributeError:
+                raise ImportError("%r has no %r attribute" % (entry,attr))
+        return entry
+
+    def require(self, env=None, installer=None):
+        if self.extras and not self.dist:
+            raise UnknownExtra("Can't require() without a distribution", self)
+        map(working_set.add,
+            working_set.resolve(self.dist.requires(self.extras),env,installer))
+
+
+
+    #@classmethod
+    def parse(cls, src, dist=None):
+        """Parse a single entry point from string `src`
+
+        Entry point syntax follows the form::
+
+            name = some.module:some.attr [extra1,extra2]
+
+        The entry name and module name are required, but the ``:attrs`` and
+        ``[extras]`` parts are optional
+        """
+        try:
+            attrs = extras = ()
+            name,value = src.split('=',1)
+            if '[' in value:
+                value,extras = value.split('[',1)
+                req = Requirement.parse("x["+extras)
+                if req.specs: raise ValueError
+                extras = req.extras
+            if ':' in value:
+                value,attrs = value.split(':',1)
+                if not MODULE(attrs.rstrip()):
+                    raise ValueError
+                attrs = attrs.rstrip().split('.')
+        except ValueError:
+            raise ValueError(
+                "EntryPoint must be in 'name=module:attrs [extras]' format",
+                src
+            )
+        else:
+            return cls(name.strip(), value.strip(), attrs, extras, dist)
+
+    parse = classmethod(parse)
+
+
+
+
+
+
+
+
+    #@classmethod
+    def parse_group(cls, group, lines, dist=None):
+        """Parse an entry point group"""
+        if not MODULE(group):
+            raise ValueError("Invalid group name", group)
+        this = {}
+        for line in yield_lines(lines):
+            ep = cls.parse(line, dist)
+            if ep.name in this:
+                raise ValueError("Duplicate entry point", group, ep.name)
+            this[ep.name]=ep
+        return this
+
+    parse_group = classmethod(parse_group)
+
+    #@classmethod
+    def parse_map(cls, data, dist=None):
+        """Parse a map of entry point groups"""
+        if isinstance(data,dict):
+            data = data.items()
+        else:
+            data = split_sections(data)
+        maps = {}
+        for group, lines in data:
+            if group is None:
+                if not lines:
+                    continue
+                raise ValueError("Entry points must be listed in groups")
+            group = group.strip()
+            if group in maps:
+                raise ValueError("Duplicate group name", group)
+            maps[group] = cls.parse_group(group, lines, dist)
+        return maps
+
+    parse_map = classmethod(parse_map)
+
+
+def _remove_md5_fragment(location):
+    if not location:
+        return ''
+    parsed = urlparse(location)
+    if parsed[-1].startswith('md5='):
+        return urlunparse(parsed[:-1] + ('',))
+    return location
+
+
+class Distribution(object):
+    """Wrap an actual or potential sys.path entry w/metadata"""
+    def __init__(self,
+        location=None, metadata=None, project_name=None, version=None,
+        py_version=PY_MAJOR, platform=None, precedence = EGG_DIST
+    ):
+        self.project_name = safe_name(project_name or 'Unknown')
+        if version is not None:
+            self._version = safe_version(version)
+        self.py_version = py_version
+        self.platform = platform
+        self.location = location
+        self.precedence = precedence
+        self._provider = metadata or empty_provider
+
+    #@classmethod
+    def from_location(cls,location,basename,metadata=None,**kw):
+        project_name, version, py_version, platform = [None]*4
+        basename, ext = os.path.splitext(basename)
+        if ext.lower() in (".egg",".egg-info"):
+            match = EGG_NAME(basename)
+            if match:
+                project_name, version, py_version, platform = match.group(
+                    'name','ver','pyver','plat'
+                )
+        return cls(
+            location, metadata, project_name=project_name, version=version,
+            py_version=py_version, platform=platform, **kw
+        )
+    from_location = classmethod(from_location)
+
+
+    hashcmp = property(
+        lambda self: (
+            getattr(self,'parsed_version',()),
+            self.precedence,
+            self.key,
+            _remove_md5_fragment(self.location),
+            self.py_version,
+            self.platform
+        )
+    )
+    def __hash__(self): return hash(self.hashcmp)
+    def __lt__(self, other):
+        return self.hashcmp < other.hashcmp
+    def __le__(self, other):
+        return self.hashcmp <= other.hashcmp
+    def __gt__(self, other):
+        return self.hashcmp > other.hashcmp
+    def __ge__(self, other):
+        return self.hashcmp >= other.hashcmp
+    def __eq__(self, other):
+        if not isinstance(other, self.__class__):
+            # It's not a Distribution, so they are not equal
+            return False
+        return self.hashcmp == other.hashcmp
+    def __ne__(self, other):
+        return not self == other
+
+    # These properties have to be lazy so that we don't have to load any
+    # metadata until/unless it's actually needed.  (i.e., some distributions
+    # may not know their name or version without loading PKG-INFO)
+
+    #@property
+    def key(self):
+        try:
+            return self._key
+        except AttributeError:
+            self._key = key = self.project_name.lower()
+            return key
+    key = property(key)
+
+    #@property
+    def parsed_version(self):
+        try:
+            return self._parsed_version
+        except AttributeError:
+            self._parsed_version = pv = parse_version(self.version)
+            return pv
+
+    parsed_version = property(parsed_version)
+
+    #@property
+    def version(self):
+        try:
+            return self._version
+        except AttributeError:
+            for line in self._get_metadata('PKG-INFO'):
+                if line.lower().startswith('version:'):
+                    self._version = safe_version(line.split(':',1)[1].strip())
+                    return self._version
+            else:
+                raise ValueError(
+                    "Missing 'Version:' header and/or PKG-INFO file", self
+                )
+    version = property(version)
+
+
+
+
+    #@property
+    def _dep_map(self):
+        try:
+            return self.__dep_map
+        except AttributeError:
+            dm = self.__dep_map = {None: []}
+            for name in 'requires.txt', 'depends.txt':
+                for extra,reqs in split_sections(self._get_metadata(name)):
+                    if extra: extra = safe_extra(extra)
+                    dm.setdefault(extra,[]).extend(parse_requirements(reqs))
+            return dm
+    _dep_map = property(_dep_map)
+
+    def requires(self,extras=()):
+        """List of Requirements needed for this distro if `extras` are used"""
+        dm = self._dep_map
+        deps = []
+        deps.extend(dm.get(None,()))
+        for ext in extras:
+            try:
+                deps.extend(dm[safe_extra(ext)])
+            except KeyError:
+                raise UnknownExtra(
+                    "%s has no such extra feature %r" % (self, ext)
+                )
+        return deps
+
+    def _get_metadata(self,name):
+        if self.has_metadata(name):
+            for line in self.get_metadata_lines(name):
+                yield line
+
+    def activate(self,path=None):
+        """Ensure distribution is importable on `path` (default=sys.path)"""
+        if path is None: path = sys.path
+        self.insert_on(path)
+        if path is sys.path:
+            fixup_namespace_packages(self.location)
+            map(declare_namespace, self._get_metadata('namespace_packages.txt'))
+
+
+    def egg_name(self):
+        """Return what this distribution's standard .egg filename should be"""
+        filename = "%s-%s-py%s" % (
+            to_filename(self.project_name), to_filename(self.version),
+            self.py_version or PY_MAJOR
+        )
+
+        if self.platform:
+            filename += '-'+self.platform
+        return filename
+
+    def __repr__(self):
+        if self.location:
+            return "%s (%s)" % (self,self.location)
+        else:
+            return str(self)
+
+    def __str__(self):
+        try: version = getattr(self,'version',None)
+        except ValueError: version = None
+        version = version or "[unknown version]"
+        return "%s %s" % (self.project_name,version)
+
+    def __getattr__(self,attr):
+        """Delegate all unrecognized public attributes to .metadata provider"""
+        if attr.startswith('_'):
+            raise AttributeError,attr
+        return getattr(self._provider, attr)
+
+    #@classmethod
+    def from_filename(cls,filename,metadata=None, **kw):
+        return cls.from_location(
+            _normalize_cached(filename), os.path.basename(filename), metadata,
+            **kw
+        )
+    from_filename = classmethod(from_filename)
+
+    def as_requirement(self):
+        """Return a ``Requirement`` that matches this distribution exactly"""
+        return Requirement.parse('%s==%s' % (self.project_name, self.version))
+
+    def load_entry_point(self, group, name):
+        """Return the `name` entry point of `group` or raise ImportError"""
+        ep = self.get_entry_info(group,name)
+        if ep is None:
+            raise ImportError("Entry point %r not found" % ((group,name),))
+        return ep.load()
+
+    def get_entry_map(self, group=None):
+        """Return the entry point map for `group`, or the full entry map"""
+        try:
+            ep_map = self._ep_map
+        except AttributeError:
+            ep_map = self._ep_map = EntryPoint.parse_map(
+                self._get_metadata('entry_points.txt'), self
+            )
+        if group is not None:
+            return ep_map.get(group,{})
+        return ep_map
+
+    def get_entry_info(self, group, name):
+        """Return the EntryPoint object for `group`+`name`, or ``None``"""
+        return self.get_entry_map(group).get(name)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+    def insert_on(self, path, loc = None):
+        """Insert self.location in path before its nearest parent directory"""
+
+        loc = loc or self.location
+
+        if self.project_name == 'setuptools':
+            try:
+                version = self.version
+            except ValueError:
+                version = ''
+            if version.startswith('0.7'):
+                raise ValueError(
+                    "A 0.7-series setuptools cannot be installed "
+                    "with distribute. Found one at %s" % str(self.location))
+
+        if not loc:
+            return
+
+        if path is sys.path:
+            self.check_version_conflict()
+
+        nloc = _normalize_cached(loc)
+        bdir = os.path.dirname(nloc)
+        npath= map(_normalize_cached, path)
+
+        bp = None
+        for p, item in enumerate(npath):
+            if item==nloc:
+                break
+            elif item==bdir and self.precedence==EGG_DIST:
+                # if it's an .egg, give it precedence over its directory
+                path.insert(p, loc)
+                npath.insert(p, nloc)
+                break
+        else:
+            path.append(loc)
+            return
+
+        # p is the spot where we found or inserted loc; now remove duplicates
+        while 1:
+            try:
+                np = npath.index(nloc, p+1)
+            except ValueError:
+                break
+            else:
+                del npath[np], path[np]
+                p = np  # ha!
+
+        return
+
+
+
+    def check_version_conflict(self):
+        if self.key=='distribute':
+            return      # ignore the inevitable setuptools self-conflicts  :(
+
+        nsp = dict.fromkeys(self._get_metadata('namespace_packages.txt'))
+        loc = normalize_path(self.location)
+        for modname in self._get_metadata('top_level.txt'):
+            if (modname not in sys.modules or modname in nsp
+                or modname in _namespace_packages
+            ):
+                continue
+            if modname in ('pkg_resources', 'setuptools', 'site'):
+                continue
+            fn = getattr(sys.modules[modname], '__file__', None)
+            if fn and (normalize_path(fn).startswith(loc) or
+                       fn.startswith(self.location)):
+                continue
+            issue_warning(
+                "Module %s was already imported from %s, but %s is being added"
+                " to sys.path" % (modname, fn, self.location),
+            )
+
+    def has_version(self):
+        try:
+            self.version
+        except ValueError:
+            issue_warning("Unbuilt egg for "+repr(self))
+            return False
+        return True
+
+    def clone(self,**kw):
+        """Copy this distribution, substituting in any changed keyword args"""
+        for attr in (
+            'project_name', 'version', 'py_version', 'platform', 'location',
+            'precedence'
+        ):
+            kw.setdefault(attr, getattr(self,attr,None))
+        kw.setdefault('metadata', self._provider)
+        return self.__class__(**kw)
+
+
+
+
+    #@property
+    def extras(self):
+        return [dep for dep in self._dep_map if dep]
+    extras = property(extras)
+
+
+def issue_warning(*args,**kw):
+    level = 1
+    g = globals()
+    try:
+        # find the first stack frame that is *not* code in
+        # the pkg_resources module, to use for the warning
+        while sys._getframe(level).f_globals is g:
+            level += 1
+    except ValueError:
+        pass
+    from warnings import warn
+    warn(stacklevel = level+1, *args, **kw)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+def parse_requirements(strs):
+    """Yield ``Requirement`` objects for each specification in `strs`
+
+    `strs` must be an instance of ``basestring``, or a (possibly-nested)
+    iterable thereof.
+    """
+    # create a steppable iterator, so we can handle \-continuations
+    lines = iter(yield_lines(strs))
+
+    def scan_list(ITEM,TERMINATOR,line,p,groups,item_name):
+
+        items = []
+
+        while not TERMINATOR(line,p):
+            if CONTINUE(line,p):
+                try:
+                    line = lines.next(); p = 0
+                except StopIteration:
+                    raise ValueError(
+                        "\\ must not appear on the last nonblank line"
+                    )
+
+            match = ITEM(line,p)
+            if not match:
+                raise ValueError("Expected "+item_name+" in",line,"at",line[p:])
+
+            items.append(match.group(*groups))
+            p = match.end()
+
+            match = COMMA(line,p)
+            if match:
+                p = match.end() # skip the comma
+            elif not TERMINATOR(line,p):
+                raise ValueError(
+                    "Expected ',' or end-of-list in",line,"at",line[p:]
+                )
+
+        match = TERMINATOR(line,p)
+        if match: p = match.end()   # skip the terminator, if any
+        return line, p, items
+
+    for line in lines:
+        match = DISTRO(line)
+        if not match:
+            raise ValueError("Missing distribution spec", line)
+        project_name = match.group(1)
+        p = match.end()
+        extras = []
+
+        match = OBRACKET(line,p)
+        if match:
+            p = match.end()
+            line, p, extras = scan_list(
+                DISTRO, CBRACKET, line, p, (1,), "'extra' name"
+            )
+
+        line, p, specs = scan_list(VERSION,LINE_END,line,p,(1,2),"version spec")
+        specs = [(op,safe_version(val)) for op,val in specs]
+        yield Requirement(project_name, specs, extras)
+
+
+def _sort_dists(dists):
+    tmp = [(dist.hashcmp,dist) for dist in dists]
+    tmp.sort()
+    dists[::-1] = [d for hc,d in tmp]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+class Requirement:
+    def __init__(self, project_name, specs, extras):
+        """DO NOT CALL THIS UNDOCUMENTED METHOD; use Requirement.parse()!"""
+        self.unsafe_name, project_name = project_name, safe_name(project_name)
+        self.project_name, self.key = project_name, project_name.lower()
+        index = [(parse_version(v),state_machine[op],op,v) for op,v in specs]
+        index.sort()
+        self.specs = [(op,ver) for parsed,trans,op,ver in index]
+        self.index, self.extras = index, tuple(map(safe_extra,extras))
+        self.hashCmp = (
+            self.key, tuple([(op,parsed) for parsed,trans,op,ver in index]),
+            frozenset(self.extras)
+        )
+        self.__hash = hash(self.hashCmp)
+
+    def __str__(self):
+        specs = ','.join([''.join(s) for s in self.specs])
+        extras = ','.join(self.extras)
+        if extras: extras = '[%s]' % extras
+        return '%s%s%s' % (self.project_name, extras, specs)
+
+    def __eq__(self,other):
+        return isinstance(other,Requirement) and self.hashCmp==other.hashCmp
+
+    def __contains__(self,item):
+        if isinstance(item,Distribution):
+            if item.key <> self.key: return False
+            if self.index: item = item.parsed_version  # only get if we need it
+        elif isinstance(item,basestring):
+            item = parse_version(item)
+        last = None
+        compare = lambda a, b: (a > b) - (a < b) # -1, 0, 1
+        for parsed,trans,op,ver in self.index:
+            action = trans[compare(item,parsed)] # Indexing: 0, 1, -1
+            if action=='F':     return False
+            elif action=='T':   return True
+            elif action=='+':   last = True
+            elif action=='-' or last is None:   last = False
+        if last is None: last = True    # no rules encountered
+        return last
+
+
+    def __hash__(self):
+        return self.__hash
+
+    def __repr__(self): return "Requirement.parse(%r)" % str(self)
+
+    #@staticmethod
+    def parse(s, replacement=True):
+        reqs = list(parse_requirements(s))
+        if reqs:
+            if len(reqs) == 1:
+                founded_req = reqs[0]
+                # if asked for setuptools distribution
+                # and if distribute is installed, we want to give
+                # distribute instead
+                if _override_setuptools(founded_req) and replacement:
+                    distribute = list(parse_requirements('distribute'))
+                    if len(distribute) == 1:
+                        return distribute[0]
+                    return founded_req
+                else:
+                    return founded_req
+
+            raise ValueError("Expected only one requirement", s)
+        raise ValueError("No requirements found", s)
+
+    parse = staticmethod(parse)
+
+state_machine = {
+    #       =><
+    '<' :  '--T',
+    '<=':  'T-T',
+    '>' :  'F+F',
+    '>=':  'T+F',
+    '==':  'T..',
+    '!=':  'F++',
+}
+
+
+def _override_setuptools(req):
+    """Return True when distribute wants to override a setuptools dependency.
+
+    We want to override when the requirement is setuptools and the version is
+    a variant of 0.6.
+
+    """
+    if req.project_name == 'setuptools':
+        if not len(req.specs):
+            # Just setuptools: ok
+            return True
+        for comparator, version in req.specs:
+            if comparator in ['==', '>=', '>']:
+                if version.startswith('0.7'):
+                    # We want some setuptools not from the 0.6 series.
+                    return False
+        return True
+    return False
+
+
+def _get_mro(cls):
+    """Get an mro for a type or classic class"""
+    if not isinstance(cls,type):
+        class cls(cls,object): pass
+        return cls.__mro__[1:]
+    return cls.__mro__
+
+def _find_adapter(registry, ob):
+    """Return an adapter factory for `ob` from `registry`"""
+    for t in _get_mro(getattr(ob, '__class__', type(ob))):
+        if t in registry:
+            return registry[t]
+
+
+def ensure_directory(path):
+    """Ensure that the parent directory of `path` exists"""
+    dirname = os.path.dirname(path)
+    if not os.path.isdir(dirname):
+        os.makedirs(dirname)
+
+def split_sections(s):
+    """Split a string or iterable thereof into (section,content) pairs
+
+    Each ``section`` is a stripped version of the section header ("[section]")
+    and each ``content`` is a list of stripped lines excluding blank lines and
+    comment-only lines.  If there are any such lines before the first section
+    header, they're returned in a first ``section`` of ``None``.
+    """
+    section = None
+    content = []
+    for line in yield_lines(s):
+        if line.startswith("["):
+            if line.endswith("]"):
+                if section or content:
+                    yield section, content
+                section = line[1:-1].strip()
+                content = []
+            else:
+                raise ValueError("Invalid section heading", line)
+        else:
+            content.append(line)
+
+    # wrap up last segment
+    yield section, content
+
+def _mkstemp(*args,**kw):
+    from tempfile import mkstemp
+    old_open = os.open
+    try:
+        os.open = os_open   # temporarily bypass sandboxing
+        return mkstemp(*args,**kw)
+    finally:
+        os.open = old_open  # and then put it back
+
+
+# Set up global resource manager
+_manager = ResourceManager()
+def _initialize(g):
+    for name in dir(_manager):
+        if not name.startswith('_'):
+            g[name] = getattr(_manager, name)
+_initialize(globals())
+
+# Prepare the master working set and make the ``require()`` API available
+working_set = WorkingSet()
+try:
+    # Does the main program list any requirements?
+    from __main__ import __requires__
+except ImportError:
+    pass # No: just use the default working set based on sys.path
+else:
+    # Yes: ensure the requirements are met, by prefixing sys.path if necessary
+    try:
+        working_set.require(__requires__)
+    except VersionConflict:     # try it without defaults already on sys.path
+        working_set = WorkingSet([])    # by starting with an empty path
+        for dist in working_set.resolve(
+            parse_requirements(__requires__), Environment()
+        ):
+            working_set.add(dist)
+        for entry in sys.path:  # add any missing entries from sys.path
+            if entry not in working_set.entries:
+                working_set.add_entry(entry)
+        sys.path[:] = working_set.entries   # then copy back to sys.path
+
+require = working_set.require
+iter_entry_points = working_set.iter_entry_points
+add_activation_listener = working_set.subscribe
+run_script = working_set.run_script
+run_main = run_script   # backward compatibility
+# Activate all distributions already on sys.path, and ensure that
+# all distributions added to the working set in the future (e.g. by
+# calling ``require()``) will get activated as well.
+add_activation_listener(lambda dist: dist.activate())
+working_set.entries=[]; map(working_set.add_entry,sys.path) # match order
+