You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sdap.apache.org by gi...@apache.org on 2023/03/13 17:48:35 UTC
[incubator-sdap-in-situ-data-services] branch dependabot/pip/pyspark-3.2.2 updated (9862c31 -> 40b707c)
This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a change to branch dependabot/pip/pyspark-3.2.2
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-in-situ-data-services.git
omit 9862c31 chore(deps): bump pyspark from 3.1.2 to 3.2.2
add aba83f0 feat: add CLI script
add 4f72eb3 chore: update readme
add 8d78b1b chore: add changelog
add 0c1c4d1 fix: added `meta` column as defaul column
add 7221ffe fix: add default column in correct place
add c2a6d2a chore: update changelog + swagger
add c97b9b2 chore: merged from apache master
add 65c4651 chore: merge from apache master
add bc2e14a chore: switch to SDAP ticket
add 57ed8dc Merge branch 'master' of github.com:wphyojpl/incubator-sdap-in-situ-data-services
add 786d92a Merge branch 'master' of github.com:apache/incubator-sdap-in-situ-data-services
add 982d498 feat: add ci/cd to build lambda zip file
add 3d0fe1c breaking:Elasticsearch Logic (#1)
add 576b1c1 fix: get lambda function working
add 8d374b0 fix: throw runtime err when ES ingest fail
add f2095f3 chore: add lambda logger
add e304425 feat: use cdms_schema json to create spark struct obj
add 9824c6e chore: merge from master
add c45898c chore: use insitu schema json to get column names
add c001092 feat: add observation type counts
add f33bf06 fix: add missig param in calling stats retriever
add eeff2b9 fix: add observation agg in query
add 6398704 fix: get parquet stat to ES working for SQS multiple records input
add 69c4e39 fix: not throwing error when deleting items that do not exist in ES
add 9e30423 fix: allow config w/o checking mandatory variables
add 0fbb6a0 feat: download small parquet file to extract stat locally
add 07f6000 fix: use singleton to re-use session to reduce time
add e31e246 fix: return NULL if query not found in ES + enhance statistics endpoint to include depth, time and bbox
add c77c943 fix: use unquote_plus to replace `+` to ` ` for s3 url
add acc410f feat: not waiting for ingest to finish
add 9800a38 fix: ci/cd for lambda-docker
add fbd7f20 fix: get lambda with ECS working with pyspark
add f7d28d6 fix: add missing files
add 040776c fix: update makefile
add 3adf08a fix: use pandas to avoid int + float in source data
add 1d83a52 fix:s3 ingestion lambda works as a zip now
add 09f82be feat: make 30x30 tiles + different log level for spark vs cdms flask
add 4d8a2e8 fix: query-missing-depth = float, log statement when deleting ES, ingest w/o pandas + mandatory depth conversion to float
add 8b046e0 fix: UnsupportedOperationException:org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary workaround
add 858e23a fix: wind from & to is in long which screws with the schema
add feaf344 fix: make unique spark-app-name
add cefb0f0 feat: use alias instead of real index
add f3d7466 fix: update test constants to use alias
add 2daba45 feat: Configurable partitioning (#3)
add 69da174 Merge branch 'es.branch' of github.com:wphyojpl/incubator-sdap-in-situ-data-services into es.branch
add 5c70764 fix: config is in string form. not in int form
add e03a862 fix: config is in string form. not in int form
add c8779b1 feat: add elasticsearch index for ddb table
add 7eab73f chore: add missing test file
add 140d017 feat: add ES based metadata table
add 32de57e feat: using ES for metadata
add 8c6b786 fix: need to pass empty str, not None
add 1209603 fix: parallel validator bug
add c59d8db Merge pull request #12 from wphyojpl/es.branch
add 40b707c chore(deps): bump pyspark from 3.1.2 to 3.2.2
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (9862c31)
\
N -- N -- N refs/heads/dependabot/pip/pyspark-3.2.2 (40b707c)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
.gitignore | 4 +-
ci.cd/Makefile | 41 +
ci.cd/create_s3_zip.sh | 27 +
ci.cd/lambda_docker_upload.sh | 6 +
ci.cd/local_upload.sh | 8 +
docker/parquet.lambda.Dockerfile | 43 +
documentations/navair.demo.md | 106 ++
etc/elasticsearch/all_alias.json | 7 +
etc/elasticsearch/entry_file_records.json | 20 +
etc/elasticsearch/parquet_stats_v1.json | 64 +
etc/elasticsearch/setup_es.txt | 15 +
etc/lambda-spark/spark-class | 8 +
etc/lambda-spark/spark-defaults.conf | 4 +
k8s_spark/k8s_spark/org.alues.yaml | 731 ++++++++
k8s_spark/nohup.out | 4 +
.../parquet.spark.helm/charts/spark-5.9.4.tgz | Bin 0 -> 36223 bytes
k8s_spark/parquet.spark.helm/nohup.out | 1973 ++++++++++++++++++++
.../parquet.spark.helm/templates/deployment.yaml | 4 +
k8s_spark/parquet.spark.helm/values.yaml | 10 +-
nohup.out | 4 +
one_offs/local_flask.py | 16 +
one_offs/py_geo_hash_test.py | 12 +
one_offs/trigger.s3.ingest.py | 43 +
parquet_flask/__init__.py | 5 +-
parquet_flask/__main__.py | 9 +-
parquet_flask/aws/aws_cred.py | 23 +-
parquet_flask/aws/es_abstract.py | 55 +
parquet_flask/aws/es_factory.py | 16 +
parquet_flask/aws/es_middleware.py | 202 ++
parquet_flask/aws/es_middleware_aws.py | 30 +
.../cdms_lambda_func/cdms_lambda_constants.py | 8 +
.../cdms_lambda_func/index_to_es}/__init__.py | 0
.../cdms_lambda_func/index_to_es/execute_lambda.py | 18 +
.../index_to_es/parquet_file_es_indexer.py | 85 +
.../index_to_es/parquet_stat_extractor.py | 35 +
.../index_to_es/s3_stat_extractor.py | 202 ++
.../ingest_s3_to_cdms/ingest_s3_to_cdms.py | 66 +-
parquet_flask/cdms_lambda_func/lambda_func_env.py | 3 +
.../cdms_lambda_func/lambda_logger_generator.py | 30 +
.../cdms_lambda_func/s3_records}/__init__.py | 0
.../cdms_lambda_func/s3_records/s3_2_sqs.py | 165 ++
.../s3_records/s3_event_validator_abstract.py | 19 +
parquet_flask/io_logic/cdms_constants.py | 21 +-
parquet_flask/io_logic/cdms_schema.py | 89 +
parquet_flask/io_logic/ingest_new_file.py | 92 +-
.../{metadata_tbl_io.py => metadata_tbl_es.py} | 44 +-
parquet_flask/io_logic/metadata_tbl_interface.py | 4 +
parquet_flask/io_logic/metadata_tbl_io.py | 4 +
.../io_logic/parquet_paths_es_retriever.py | 114 ++
.../parquet_query_condition_management_v3.py | 19 +-
...py => parquet_query_condition_management_v4.py} | 79 +-
parquet_flask/io_logic/partitioned_parquet_path.py | 48 +
parquet_flask/io_logic/query_v4.py | 51 +-
parquet_flask/io_logic/raw_query.py | 2 +-
parquet_flask/io_logic/replace_file.py | 2 +-
.../io_logic/sub_collection_statistics.py | 290 +++
.../parquet_stat_extractor}/__init__.py | 0
.../parquet_stat_extractor/local_spark_session.py | 16 +
.../local_statistics_retriever.py | 34 +
.../parquet_stat_extractor/statistics_retriever.py | 206 ++
.../statistics_retriever_wrapper.py | 39 +
parquet_flask/utils/config.py | 16 +-
parquet_flask/utils/factory_abstract.py | 7 +
parquet_flask/utils/general_utils.py | 8 +
parquet_flask/utils/parallel_json_validator.py | 15 +-
parquet_flask/utils/spatial_utils.py | 30 +
parquet_flask/utils/time_utils.py | 12 +-
parquet_flask/v1/__init__.py | 10 +-
.../v1/extract_statistics_from_parquet_file.py | 47 +
parquet_flask/v1/ingest_aws_json.py | 17 +-
.../v1/insitu_query_swagger/insitu-spec-0.0.1.yml | 2 +-
.../v1/query_data_doms_custom_pagination.py | 18 +-
.../v1/sub_collection_statistics_endpoint.py | 80 +
rotate_keys.bash | 28 +
setup.py | 5 +-
setup.py => setup_lambda.py | 18 +-
tests/back_to_basis/Test1/._SUCCESS.crc | Bin 0 -> 8 bytes
.../back_to_basis/Test1/_SUCCESS | 0
{parquet_cli => tests/back_to_basis}/__init__.py | 0
tests/back_to_basis/local_spark.py | 54 +
tests/back_to_basis/s3_read.py | 28 +
tests/back_to_basis/s3_spark.py | 51 +
tests/bench_mark/bench_mark.py | 52 +-
tests/bench_mark/bench_parallel_process.py | 32 +-
tests/get_aws_creds.py | 16 +
.../parquet_flask/aws}/__init__.py | 0
.../aws/manual_test_es_middleware_aws.py | 31 +
.../parquet_flask/cdms_lambda_func}/__init__.py | 0
.../cdms_lambda_func/index_to_es}/__init__.py | 0
.../manual_test_parquet_file_es_indexer.py | 75 +
.../index_to_es/test_parquet_stat_extractor.py | 37 +
.../index_to_es/test_s3_stat_extractor.py | 44 +
.../cdms_lambda_func/s3_records}/__init__.py | 0
.../cdms_lambda_func/s3_records/test_s3_s2_sqs.py | 32 +
.../manual_test_parquet_paths_es_retriever.py | 33 +
tests/parquet_flask/io_logic/test_cdms_schema.py | 29 +
.../parquet_flask/io_logic/test_ingest_new_file.py | 20 +
.../parquet_flask/io_logic/test_metadata_tbl_es.py | 54 +
.../test_parquet_query_condition_management_v3.py | 147 +-
.../io_logic/test_partitioned_parquet_path.py | 53 +-
.../parquet_stat_extractor}/__init__.py | 0
.../parquet_stat_extractor/in_situ_schema.json | 0
...882-3536-435b-b736-96bf3be9ee29.c000.gz.parquet | Bin 0 -> 17393 bytes
.../test_local_statistics_retriever.py | 170 ++
.../test_statistics_retriever.py | 63 +
tests/parquet_flask/utils/test_general_utils.py | 7 +
tests/parquet_flask/utils/test_spatial_utils.py | 23 +
107 files changed, 6367 insertions(+), 272 deletions(-)
create mode 100644 ci.cd/Makefile
create mode 100755 ci.cd/create_s3_zip.sh
create mode 100644 ci.cd/lambda_docker_upload.sh
create mode 100755 ci.cd/local_upload.sh
create mode 100644 docker/parquet.lambda.Dockerfile
create mode 100644 documentations/navair.demo.md
create mode 100644 etc/elasticsearch/all_alias.json
create mode 100644 etc/elasticsearch/entry_file_records.json
create mode 100644 etc/elasticsearch/parquet_stats_v1.json
create mode 100644 etc/elasticsearch/setup_es.txt
create mode 100644 etc/lambda-spark/spark-class
create mode 100644 etc/lambda-spark/spark-defaults.conf
create mode 100644 k8s_spark/k8s_spark/org.alues.yaml
create mode 100644 k8s_spark/nohup.out
create mode 100644 k8s_spark/parquet.spark.helm/charts/spark-5.9.4.tgz
create mode 100644 k8s_spark/parquet.spark.helm/nohup.out
create mode 100644 nohup.out
create mode 100644 one_offs/local_flask.py
create mode 100644 one_offs/py_geo_hash_test.py
create mode 100644 one_offs/trigger.s3.ingest.py
create mode 100644 parquet_flask/aws/es_abstract.py
create mode 100644 parquet_flask/aws/es_factory.py
create mode 100644 parquet_flask/aws/es_middleware.py
create mode 100644 parquet_flask/aws/es_middleware_aws.py
create mode 100644 parquet_flask/cdms_lambda_func/cdms_lambda_constants.py
copy {parquet_cli => parquet_flask/cdms_lambda_func/index_to_es}/__init__.py (100%)
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/execute_lambda.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/parquet_file_es_indexer.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/parquet_stat_extractor.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/s3_stat_extractor.py
create mode 100644 parquet_flask/cdms_lambda_func/lambda_logger_generator.py
copy {parquet_cli => parquet_flask/cdms_lambda_func/s3_records}/__init__.py (100%)
create mode 100644 parquet_flask/cdms_lambda_func/s3_records/s3_2_sqs.py
create mode 100644 parquet_flask/cdms_lambda_func/s3_records/s3_event_validator_abstract.py
copy parquet_flask/io_logic/{metadata_tbl_io.py => metadata_tbl_es.py} (57%)
create mode 100644 parquet_flask/io_logic/parquet_paths_es_retriever.py
copy parquet_flask/io_logic/{parquet_query_condition_management_v3.py => parquet_query_condition_management_v4.py} (64%)
create mode 100644 parquet_flask/io_logic/sub_collection_statistics.py
copy {parquet_cli => parquet_flask/parquet_stat_extractor}/__init__.py (100%)
create mode 100644 parquet_flask/parquet_stat_extractor/local_spark_session.py
create mode 100644 parquet_flask/parquet_stat_extractor/local_statistics_retriever.py
create mode 100644 parquet_flask/parquet_stat_extractor/statistics_retriever.py
create mode 100644 parquet_flask/parquet_stat_extractor/statistics_retriever_wrapper.py
create mode 100644 parquet_flask/utils/factory_abstract.py
create mode 100644 parquet_flask/utils/spatial_utils.py
create mode 100644 parquet_flask/v1/extract_statistics_from_parquet_file.py
create mode 100644 parquet_flask/v1/sub_collection_statistics_endpoint.py
create mode 100755 rotate_keys.bash
copy setup.py => setup_lambda.py (65%)
create mode 100644 tests/back_to_basis/Test1/._SUCCESS.crc
copy parquet_cli/__init__.py => tests/back_to_basis/Test1/_SUCCESS (100%)
copy {parquet_cli => tests/back_to_basis}/__init__.py (100%)
create mode 100644 tests/back_to_basis/local_spark.py
create mode 100644 tests/back_to_basis/s3_read.py
create mode 100644 tests/back_to_basis/s3_spark.py
create mode 100644 tests/get_aws_creds.py
copy {parquet_cli => tests/parquet_flask/aws}/__init__.py (100%)
create mode 100644 tests/parquet_flask/aws/manual_test_es_middleware_aws.py
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func}/__init__.py (100%)
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func/index_to_es}/__init__.py (100%)
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/manual_test_parquet_file_es_indexer.py
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/test_parquet_stat_extractor.py
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/test_s3_stat_extractor.py
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func/s3_records}/__init__.py (100%)
create mode 100644 tests/parquet_flask/cdms_lambda_func/s3_records/test_s3_s2_sqs.py
create mode 100644 tests/parquet_flask/io_logic/manual_test_parquet_paths_es_retriever.py
create mode 100644 tests/parquet_flask/io_logic/test_cdms_schema.py
create mode 100644 tests/parquet_flask/io_logic/test_ingest_new_file.py
create mode 100644 tests/parquet_flask/io_logic/test_metadata_tbl_es.py
copy {parquet_cli => tests/parquet_flask/parquet_stat_extractor}/__init__.py (100%)
copy in_situ_schema.json => tests/parquet_flask/parquet_stat_extractor/in_situ_schema.json (100%)
create mode 100644 tests/parquet_flask/parquet_stat_extractor/part-00000-74ebb882-3536-435b-b736-96bf3be9ee29.c000.gz.parquet
create mode 100644 tests/parquet_flask/parquet_stat_extractor/test_local_statistics_retriever.py
create mode 100644 tests/parquet_flask/parquet_stat_extractor/test_statistics_retriever.py
create mode 100644 tests/parquet_flask/utils/test_spatial_utils.py