You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sdap.apache.org by rk...@apache.org on 2023/03/13 17:47:57 UTC
[incubator-sdap-in-situ-data-services] branch master updated (7664731 -> c59d8db)
This is an automated email from the ASF dual-hosted git repository.
rkk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-in-situ-data-services.git
from 7664731 /version 0.3.0
new aba83f0 feat: add CLI script
new 4f72eb3 chore: update readme
new 8d78b1b chore: add changelog
new 0c1c4d1 fix: added `meta` column as defaul column
new 7221ffe fix: add default column in correct place
new c2a6d2a chore: update changelog + swagger
new c97b9b2 chore: merged from apache master
new 65c4651 chore: merge from apache master
new bc2e14a chore: switch to SDAP ticket
new 57ed8dc Merge branch 'master' of github.com:wphyojpl/incubator-sdap-in-situ-data-services
new 786d92a Merge branch 'master' of github.com:apache/incubator-sdap-in-situ-data-services
new 982d498 feat: add ci/cd to build lambda zip file
new 3d0fe1c breaking:Elasticsearch Logic (#1)
new 576b1c1 fix: get lambda function working
new 8d374b0 fix: throw runtime err when ES ingest fail
new f2095f3 chore: add lambda logger
new e304425 feat: use cdms_schema json to create spark struct obj
new 9824c6e chore: merge from master
new c45898c chore: use insitu schema json to get column names
new c001092 feat: add observation type counts
new f33bf06 fix: add missig param in calling stats retriever
new eeff2b9 fix: add observation agg in query
new 6398704 fix: get parquet stat to ES working for SQS multiple records input
new 69c4e39 fix: not throwing error when deleting items that do not exist in ES
new 9e30423 fix: allow config w/o checking mandatory variables
new 0fbb6a0 feat: download small parquet file to extract stat locally
new 07f6000 fix: use singleton to re-use session to reduce time
new e31e246 fix: return NULL if query not found in ES + enhance statistics endpoint to include depth, time and bbox
new c77c943 fix: use unquote_plus to replace `+` to ` ` for s3 url
new acc410f feat: not waiting for ingest to finish
new 9800a38 fix: ci/cd for lambda-docker
new fbd7f20 fix: get lambda with ECS working with pyspark
new f7d28d6 fix: add missing files
new 040776c fix: update makefile
new 3adf08a fix: use pandas to avoid int + float in source data
new 1d83a52 fix:s3 ingestion lambda works as a zip now
new 09f82be feat: make 30x30 tiles + different log level for spark vs cdms flask
new 4d8a2e8 fix: query-missing-depth = float, log statement when deleting ES, ingest w/o pandas + mandatory depth conversion to float
new 8b046e0 fix: UnsupportedOperationException:org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary workaround
new 858e23a fix: wind from & to is in long which screws with the schema
new feaf344 fix: make unique spark-app-name
new cefb0f0 feat: use alias instead of real index
new f3d7466 fix: update test constants to use alias
new 2daba45 feat: Configurable partitioning (#3)
new 69da174 Merge branch 'es.branch' of github.com:wphyojpl/incubator-sdap-in-situ-data-services into es.branch
new 5c70764 fix: config is in string form. not in int form
new e03a862 fix: config is in string form. not in int form
new c8779b1 feat: add elasticsearch index for ddb table
new 7eab73f chore: add missing test file
new 140d017 feat: add ES based metadata table
new 32de57e feat: using ES for metadata
new 8c6b786 fix: need to pass empty str, not None
new 1209603 fix: parallel validator bug
new c59d8db Merge pull request #12 from wphyojpl/es.branch
The 232 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.gitignore | 4 +-
ci.cd/Makefile | 41 +
ci.cd/create_s3_zip.sh | 27 +
ci.cd/lambda_docker_upload.sh | 6 +
ci.cd/local_upload.sh | 8 +
docker/parquet.lambda.Dockerfile | 43 +
documentations/navair.demo.md | 106 ++
etc/elasticsearch/all_alias.json | 7 +
etc/elasticsearch/entry_file_records.json | 20 +
etc/elasticsearch/parquet_stats_v1.json | 64 +
etc/elasticsearch/setup_es.txt | 15 +
etc/lambda-spark/spark-class | 8 +
etc/lambda-spark/spark-defaults.conf | 4 +
k8s_spark/k8s_spark/org.alues.yaml | 731 ++++++++
k8s_spark/nohup.out | 4 +
.../parquet.spark.helm/charts/spark-5.9.4.tgz | Bin 0 -> 36223 bytes
k8s_spark/parquet.spark.helm/nohup.out | 1973 ++++++++++++++++++++
.../parquet.spark.helm/templates/deployment.yaml | 4 +
k8s_spark/parquet.spark.helm/values.yaml | 10 +-
nohup.out | 4 +
one_offs/local_flask.py | 16 +
one_offs/py_geo_hash_test.py | 12 +
one_offs/trigger.s3.ingest.py | 43 +
parquet_flask/__init__.py | 5 +-
parquet_flask/__main__.py | 9 +-
parquet_flask/aws/aws_cred.py | 23 +-
parquet_flask/aws/es_abstract.py | 55 +
parquet_flask/aws/es_factory.py | 16 +
parquet_flask/aws/es_middleware.py | 202 ++
parquet_flask/aws/es_middleware_aws.py | 30 +
.../cdms_lambda_func/cdms_lambda_constants.py | 8 +
.../cdms_lambda_func/index_to_es}/__init__.py | 0
.../cdms_lambda_func/index_to_es/execute_lambda.py | 18 +
.../index_to_es/parquet_file_es_indexer.py | 85 +
.../index_to_es/parquet_stat_extractor.py | 35 +
.../index_to_es/s3_stat_extractor.py | 202 ++
.../ingest_s3_to_cdms/ingest_s3_to_cdms.py | 66 +-
parquet_flask/cdms_lambda_func/lambda_func_env.py | 3 +
.../cdms_lambda_func/lambda_logger_generator.py | 30 +
.../cdms_lambda_func/s3_records}/__init__.py | 0
.../cdms_lambda_func/s3_records/s3_2_sqs.py | 165 ++
.../s3_records/s3_event_validator_abstract.py | 19 +
parquet_flask/io_logic/cdms_constants.py | 21 +-
parquet_flask/io_logic/cdms_schema.py | 89 +
parquet_flask/io_logic/ingest_new_file.py | 92 +-
.../{metadata_tbl_io.py => metadata_tbl_es.py} | 44 +-
parquet_flask/io_logic/metadata_tbl_interface.py | 4 +
parquet_flask/io_logic/metadata_tbl_io.py | 4 +
.../io_logic/parquet_paths_es_retriever.py | 114 ++
.../parquet_query_condition_management_v3.py | 19 +-
...py => parquet_query_condition_management_v4.py} | 79 +-
parquet_flask/io_logic/partitioned_parquet_path.py | 48 +
parquet_flask/io_logic/query_v4.py | 51 +-
parquet_flask/io_logic/raw_query.py | 2 +-
parquet_flask/io_logic/replace_file.py | 2 +-
.../io_logic/sub_collection_statistics.py | 290 +++
.../parquet_stat_extractor}/__init__.py | 0
.../parquet_stat_extractor/local_spark_session.py | 16 +
.../local_statistics_retriever.py | 34 +
.../parquet_stat_extractor/statistics_retriever.py | 206 ++
.../statistics_retriever_wrapper.py | 39 +
parquet_flask/utils/config.py | 16 +-
parquet_flask/utils/factory_abstract.py | 7 +
parquet_flask/utils/general_utils.py | 8 +
parquet_flask/utils/parallel_json_validator.py | 15 +-
parquet_flask/utils/spatial_utils.py | 30 +
parquet_flask/utils/time_utils.py | 12 +-
parquet_flask/v1/__init__.py | 10 +-
.../v1/extract_statistics_from_parquet_file.py | 47 +
parquet_flask/v1/ingest_aws_json.py | 17 +-
.../v1/insitu_query_swagger/insitu-spec-0.0.1.yml | 2 +-
.../v1/query_data_doms_custom_pagination.py | 18 +-
.../v1/sub_collection_statistics_endpoint.py | 80 +
rotate_keys.bash | 28 +
setup.py | 5 +-
setup.py => setup_lambda.py | 18 +-
tests/back_to_basis/Test1/._SUCCESS.crc | Bin 0 -> 8 bytes
.../back_to_basis/Test1/_SUCCESS | 0
{parquet_cli => tests/back_to_basis}/__init__.py | 0
tests/back_to_basis/local_spark.py | 54 +
tests/back_to_basis/s3_read.py | 28 +
tests/back_to_basis/s3_spark.py | 51 +
tests/bench_mark/bench_mark.py | 52 +-
tests/bench_mark/bench_parallel_process.py | 32 +-
tests/get_aws_creds.py | 16 +
.../parquet_flask/aws}/__init__.py | 0
.../aws/manual_test_es_middleware_aws.py | 31 +
.../parquet_flask/cdms_lambda_func}/__init__.py | 0
.../cdms_lambda_func/index_to_es}/__init__.py | 0
.../manual_test_parquet_file_es_indexer.py | 75 +
.../index_to_es/test_parquet_stat_extractor.py | 37 +
.../index_to_es/test_s3_stat_extractor.py | 44 +
.../cdms_lambda_func/s3_records}/__init__.py | 0
.../cdms_lambda_func/s3_records/test_s3_s2_sqs.py | 32 +
.../manual_test_parquet_paths_es_retriever.py | 33 +
tests/parquet_flask/io_logic/test_cdms_schema.py | 29 +
.../parquet_flask/io_logic/test_ingest_new_file.py | 20 +
.../parquet_flask/io_logic/test_metadata_tbl_es.py | 54 +
.../test_parquet_query_condition_management_v3.py | 147 +-
.../io_logic/test_partitioned_parquet_path.py | 53 +-
.../parquet_stat_extractor}/__init__.py | 0
.../parquet_stat_extractor/in_situ_schema.json | 0
...882-3536-435b-b736-96bf3be9ee29.c000.gz.parquet | Bin 0 -> 17393 bytes
.../test_local_statistics_retriever.py | 170 ++
.../test_statistics_retriever.py | 63 +
tests/parquet_flask/utils/test_general_utils.py | 7 +
tests/parquet_flask/utils/test_spatial_utils.py | 23 +
107 files changed, 6367 insertions(+), 272 deletions(-)
create mode 100644 ci.cd/Makefile
create mode 100755 ci.cd/create_s3_zip.sh
create mode 100644 ci.cd/lambda_docker_upload.sh
create mode 100755 ci.cd/local_upload.sh
create mode 100644 docker/parquet.lambda.Dockerfile
create mode 100644 documentations/navair.demo.md
create mode 100644 etc/elasticsearch/all_alias.json
create mode 100644 etc/elasticsearch/entry_file_records.json
create mode 100644 etc/elasticsearch/parquet_stats_v1.json
create mode 100644 etc/elasticsearch/setup_es.txt
create mode 100644 etc/lambda-spark/spark-class
create mode 100644 etc/lambda-spark/spark-defaults.conf
create mode 100644 k8s_spark/k8s_spark/org.alues.yaml
create mode 100644 k8s_spark/nohup.out
create mode 100644 k8s_spark/parquet.spark.helm/charts/spark-5.9.4.tgz
create mode 100644 k8s_spark/parquet.spark.helm/nohup.out
create mode 100644 nohup.out
create mode 100644 one_offs/local_flask.py
create mode 100644 one_offs/py_geo_hash_test.py
create mode 100644 one_offs/trigger.s3.ingest.py
create mode 100644 parquet_flask/aws/es_abstract.py
create mode 100644 parquet_flask/aws/es_factory.py
create mode 100644 parquet_flask/aws/es_middleware.py
create mode 100644 parquet_flask/aws/es_middleware_aws.py
create mode 100644 parquet_flask/cdms_lambda_func/cdms_lambda_constants.py
copy {parquet_cli => parquet_flask/cdms_lambda_func/index_to_es}/__init__.py (100%)
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/execute_lambda.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/parquet_file_es_indexer.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/parquet_stat_extractor.py
create mode 100644 parquet_flask/cdms_lambda_func/index_to_es/s3_stat_extractor.py
create mode 100644 parquet_flask/cdms_lambda_func/lambda_logger_generator.py
copy {parquet_cli => parquet_flask/cdms_lambda_func/s3_records}/__init__.py (100%)
create mode 100644 parquet_flask/cdms_lambda_func/s3_records/s3_2_sqs.py
create mode 100644 parquet_flask/cdms_lambda_func/s3_records/s3_event_validator_abstract.py
copy parquet_flask/io_logic/{metadata_tbl_io.py => metadata_tbl_es.py} (57%)
create mode 100644 parquet_flask/io_logic/parquet_paths_es_retriever.py
copy parquet_flask/io_logic/{parquet_query_condition_management_v3.py => parquet_query_condition_management_v4.py} (64%)
create mode 100644 parquet_flask/io_logic/sub_collection_statistics.py
copy {parquet_cli => parquet_flask/parquet_stat_extractor}/__init__.py (100%)
create mode 100644 parquet_flask/parquet_stat_extractor/local_spark_session.py
create mode 100644 parquet_flask/parquet_stat_extractor/local_statistics_retriever.py
create mode 100644 parquet_flask/parquet_stat_extractor/statistics_retriever.py
create mode 100644 parquet_flask/parquet_stat_extractor/statistics_retriever_wrapper.py
create mode 100644 parquet_flask/utils/factory_abstract.py
create mode 100644 parquet_flask/utils/spatial_utils.py
create mode 100644 parquet_flask/v1/extract_statistics_from_parquet_file.py
create mode 100644 parquet_flask/v1/sub_collection_statistics_endpoint.py
create mode 100755 rotate_keys.bash
copy setup.py => setup_lambda.py (65%)
create mode 100644 tests/back_to_basis/Test1/._SUCCESS.crc
copy parquet_cli/__init__.py => tests/back_to_basis/Test1/_SUCCESS (100%)
copy {parquet_cli => tests/back_to_basis}/__init__.py (100%)
create mode 100644 tests/back_to_basis/local_spark.py
create mode 100644 tests/back_to_basis/s3_read.py
create mode 100644 tests/back_to_basis/s3_spark.py
create mode 100644 tests/get_aws_creds.py
copy {parquet_cli => tests/parquet_flask/aws}/__init__.py (100%)
create mode 100644 tests/parquet_flask/aws/manual_test_es_middleware_aws.py
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func}/__init__.py (100%)
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func/index_to_es}/__init__.py (100%)
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/manual_test_parquet_file_es_indexer.py
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/test_parquet_stat_extractor.py
create mode 100644 tests/parquet_flask/cdms_lambda_func/index_to_es/test_s3_stat_extractor.py
copy {parquet_cli => tests/parquet_flask/cdms_lambda_func/s3_records}/__init__.py (100%)
create mode 100644 tests/parquet_flask/cdms_lambda_func/s3_records/test_s3_s2_sqs.py
create mode 100644 tests/parquet_flask/io_logic/manual_test_parquet_paths_es_retriever.py
create mode 100644 tests/parquet_flask/io_logic/test_cdms_schema.py
create mode 100644 tests/parquet_flask/io_logic/test_ingest_new_file.py
create mode 100644 tests/parquet_flask/io_logic/test_metadata_tbl_es.py
copy {parquet_cli => tests/parquet_flask/parquet_stat_extractor}/__init__.py (100%)
copy in_situ_schema.json => tests/parquet_flask/parquet_stat_extractor/in_situ_schema.json (100%)
create mode 100644 tests/parquet_flask/parquet_stat_extractor/part-00000-74ebb882-3536-435b-b736-96bf3be9ee29.c000.gz.parquet
create mode 100644 tests/parquet_flask/parquet_stat_extractor/test_local_statistics_retriever.py
create mode 100644 tests/parquet_flask/parquet_stat_extractor/test_statistics_retriever.py
create mode 100644 tests/parquet_flask/utils/test_spatial_utils.py