You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2024/04/02 18:28:00 UTC

[jira] [Commented] (HADOOP-18830) S3A: Cut S3 Select

    [ https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833281#comment-17833281 ] 

Dongjoon Hyun commented on HADOOP-18830:
----------------------------------------

Hi, can we remove `3.4.1` from the `Fixed Version` field because `3.4.0` is already there? According to the commit log, `3.4.0` looks correct to me.
{code}
$ git log --oneline -n16
2f0dd7c4feb1 (HEAD -> branch-3.4.0, origin/branch-3.4.0) HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan.
f62f116b75dc HDFS-17299. Adding rack failure tolerance when creating a new file  (#6566)
bd8b77f398f6 (tag: release-3.4.0-RC3, tag: rel/release-3.4.0) HADOOP-19099. Add Protobuf Compatibility Notes (#6607) Contributed by Shilun Fan.
5ed3a27df041 HADOOP-19084. Prune hadoop-common transitive dependencies (#6574)
253afde7b2d0 HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
88fbe62f27e8 (tag: release-3.4.0-RC2) HADOOP-19069. Use hadoop-thirdparty 1.2.0. (#6533) Contributed by Shilun Fan
f4a44d2b3562 HADOOP-18980. S3A credential provider remapping: make extensible (#6406)
893b9efb5398 HADOOP-19059. S3A: Update AWS Java SDK to 2.23.19 (#6538)
836f0aeadc2c HADOOP-18993. Add option fs.s3a.classloader.isolation (#6301)
c8ee14a2eb2f HADOOP-19045. CreateSession Timeout - followup (#6532)
65674b977e7e HDFS-17370. Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf (#6522)
d6c66f76aeb1 HADOOP-19049. Fix StatisticsDataReferenceCleaner classloader leak (#6488)
fd0d0c90d964 HADOOP-19044. S3A: AWS SDK V2 - Update region logic (#6479)
fdbc67b9a8f5 HADOOP-18987. Various fixes to FileSystem API docs (#6292)
c8a17b09c33d HDFS-17359. EC: recheck failed streamers should only after flushing all packets. (#6503). Contributed by farmmamba.
fbea6a6d103e HADOOP-18830. Cut S3 Select (#6144)
{code}

> S3A: Cut S3 Select
> ------------------
>
>                 Key: HADOOP-18830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18830
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.9, 3.5.0, 3.4.1
>
>
> getting s3 select to work with the v2 sdk is tricky, we need to add extra libraries to the classpath beyond just bundle.jar. we can do this but
> * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic completely
> * CSV is a bad format
> * one-line JSON more structured but also way less efficient
> ORC/Parquet benefit from vectored IO and work spanning the cluster.
> accordingly, I'm wondering what to do about s3 select
> # cut?
> # downgrade to optional and document the extra classes on the classpath
> Option #2 is straightforward and effectively the default. we can also declare the feature deprecated.
> {code}
> [ERROR] testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)  Time elapsed: 147.958 s  <<< ERROR!
> java.io.IOException: java.lang.NoClassDefFoundError: software/amazon/eventstream/MessageDecoder
>         at org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
>         at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
>         at org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
>         at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org