You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/05 02:01:54 UTC
[GitHub] [arrow] GavinRay97 opened a new issue #12570: Arrow nightly Maven releases don't seem to work
GavinRay97 opened a new issue #12570:
URL: https://github.com/apache/arrow/issues/12570
Following the instructions listed here:
- https://github.com/apache/arrow/blob/650f111b524fb1c5bfbfa6f533d15929c90ddc40/docs/source/java/install.rst#installing-nightly-packages
I get the following error when trying to install. I think the content type is being mis-interpreted (as HTML rather than XML)
```java
[WARNING] The POM for org.apache.arrow:arrow-flight:jar:8.0.0.dev165 is invalid, transitive dependencies (if any) will not be available: 1 problem was encountered while building the effective model
[FATAL] Non-parseable POM C:\Users\rayga\.m2\repository\org\apache\arrow\arrow-flight\8.0.0.dev165\arrow-flight-8.0.0.
dev165.pom: expected = after attribute name (position: TEXT seen ...l="preconnect" href="https://github.githubassets.com" crossorigin>... @15:77) @ line 15, column 77
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] davisusanibar commented on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
davisusanibar commented on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1060621861
Hi Team, sorry to join late
Thank you @GavinRay97 , library are downloaded but it's invalid pom/jar.
Related to update [Arrow Java Nightly Doc](https://github.com/apache/arrow/blob/650f111b524fb1c5bfbfa6f533d15929c90ddc40/docs/source/java/install.rst#installing-nightly-packages) ... I just reviewing the issue and I see 02 options:
1. Analyze/review/configure how to integrate/use github nightly release in a transparent manner
2. Define workaround to build arrow java nightly dependencies locally:
- Add your repo to documentation
- Define more generic integration (without modifying/adding more configuration) to add to the documentation
Just working on a generic nightly build implementation using this shell script:
Code to add to the docs:
```
#!/bin/bash
# Shell variables
ARROW_JAVA_NIGHTLY_VERSION=${1:-'nightly-2022-03-03-0-github-java-jars'}
DEPENDENCY_TO_INSTALL=${2:-'arrow'}
# Local Variables
TMP_FOLDER=arrow_java_$(date +"%d-%m-%Y")
PATTERN_TO_GET_LIB_AND_VERSION='([a-z].+)-([0-9].[0-9].[0-9].dev[0-9]+).([a-z]+)'
# Aplication logic
echo $DEPENDENCY_TO_INSTALL
mkdir -p $TMP_FOLDER
pushd $TMP_FOLDER
echo "**************** 1 - Download arrow-java $1 dependencies ****************"
wget $( \
wget \
-qO- https://api.github.com/repos/ursacomputing/crossbow/releases/tags/$ARROW_JAVA_NIGHTLY_VERSION \
| jq -r '.assets[] | select((.name | endswith(".pom")) or (.name | endswith(".jar"))) | .browser_download_url' \
| grep $DEPENDENCY_TO_INSTALL )
echo "**************** 2 - Install arrow java libraries to local repository ****************"
for LIBRARY in $(ls | grep -E '.jar' | grep dev); do
[[ $LIBRARY =~ $PATTERN_TO_GET_LIB_AND_VERSION ]]
FILE=$PWD/${BASH_REMATCH[0]}
if [[ ( ${BASH_REMATCH[0]} == *"$DEPENDENCY_TO_INSTALL"* ) ]];then
if [ -f "$FILE" ]; then
FILE=$FILE
else
if [ -f "$FILE.jar" ]; then # Out of regex: -javadoc.jar / -sources.jar
FILE=$FILE.jar
else
if [ -f "$FILE-with-dependencies.jar" ]; then # Out of regex: -with-dependencies.jar
FILE=$FILE-with-dependencies.jar
else
echo "Please! Review $FILE, it was not intalled on m2 locally."
fi
fi
fi
echo "$FILE"
mvn install:install-file \
-Dfile="$FILE" \
-DgroupId=org.apache.arrow \
-DartifactId=${BASH_REMATCH[1]} \
-Dversion=${BASH_REMATCH[2]} \
-Dpackaging=${BASH_REMATCH[3]} \
-DcreateChecksum=true \
-Dgenerate.pom=true
fi
done
popd
# rm -rf $TMP_FOLDER
echo "Go to your project and execute: mvn clean install"
```
Execute: Download all dependencies / Or only jar needed
```
# Download all dependencies
sh arrow_java_nightly.sh nightly-2022-03-03-0-github-java-jars
# Download needed library, for example: memory
sh arrow_java_nightly.sh nightly-2022-03-03-0-github-java-jars memory
```
Use: Go to your pom.xml add dependencies and version needed
```
...
<properties>
<arrow.version>8.0.0.dev165</arrow.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-core</artifactId>
<version>${arrow.version}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
<version>${arrow.version}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>${logback.version}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>flight-core</artifactId>
<version>${arrow.version}</version>
</dependency>
</dependencies>
...
```
Run:
```
mvn clean install
```
Please if you could help me if this work on your side.
Thank you in advance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure:
```json
{
"name": "arrow-download-nightly-as-maven-repo",
"scripts": {
"start": "node index.mjs"
},
"dependencies": {
"cross-fetch": "^3.1.5",
"jsdom": "^19.0.0"
}
}
```
```js
// index.mjs
// Run with: $ node index.mjs
import fetch from "cross-fetch"
import asyncFS from "fs/promises"
import { JSDOM } from "jsdom"
import path from "path"
import { fileURLToPath } from "url"
// Polyfill "__dirname" for Node.js ECMAScript Module filetype
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const ARROW_NIGHTLY_TAG_URL =
"https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-03-0-github-java-jars"
async function main() {
extractArrowNightlyJarsToLocalM2Repo(ARROW_NIGHTLY_TAG_URL)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})
async function extractArrowNightlyJarsToLocalM2Repo(arrowNightlyTagUrl) {
// Parse HTML to DOM
const dom = await JSDOM.fromURL(arrowNightlyTagUrl)
const document = dom.window.document
// Get all <li> tags containing the asset name and download URL
const assetLinkEls = document.querySelectorAll("li.Box-row")
const assets = []
for (const el of assetLinkEls) {
const anchorTag = el.querySelector("a")
const assetFilename = anchorTag.textContent.trim()
const link = anchorTag.href
if (assetFilename.includes("Source code")) continue
const { library, version } = getLibraryAndVersionFromAssetFilename(assetFilename)
if (assets[library]) {
assets[library].push({ version, link, assetFilename })
} else {
assets[library] = [{ version, link, assetFilename }]
}
}
for (const [library, versions] of Object.entries(assets)) {
for (const { version, link, assetFilename } of versions) {
const basePath = "org/apache/arrow"
const libPath = `${library}/${version}`
const fullPath = path.join(__dirname, "../", basePath, libPath)
asyncFS.mkdir(fullPath, { recursive: true })
console.log("Downloading " + assetFilename + " to " + fullPath)
await downloadUrlAssetToPath(link, path.join(fullPath, assetFilename))
}
}
}
async function downloadUrlAssetToPath(url, filepath) {
const request = await fetch(url)
const content = await request.text()
return asyncFS.writeFile(filepath, content)
}
// M2 repo folder format:
// org/apache/arrow/<lib-name>/<version>/<lib-name>-<version>.(ext)
function getLibraryAndVersionFromAssetFilename(filename) {
const libraryAndVersionRegex = /(?<library>.+)-(?<version>\d\.\d\.\d.dev\d+)/
return filename.match(libraryAndVersionRegex)?.groups
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1060970759
A JIRA was filed here: https://issues.apache.org/jira/browse/ARROW-15865
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059817506
I've used a regular GitHub repository as a Maven repository before, for that you have to use the "raw" URL:
```groovy
repositories {
maven {
name "expecty"
url "https://raw.github.com/pniederw/expecty/master/m2repo/"
}
}
```
Maybe something like this might be needed for using tagged releases too? I checked the POM it pulled and it's the HTML of the GitHub page rather than the actual asset.
(I wanted to start prototyping a project with FlightSQL but there was some issue with it making it into the v7.0.0 release POMs)
Today I will try to write a script that takes the URL to the nightly Java releases, downloads all the assets, and then creates the proper M2 folder structure for the version number.
I'll publish last night's releases to this repo and share the URL for anyone else who might want a temporary fix until the 7.0.1 or 8.0.0 staging releases 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 commented on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 commented on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059817506
I've used a regular GitHub repository as a Maven repository before, for that you have to use the "raw" URL
Maybe something like this might be needed for using tagged releases too? I checked the POM it pulled and it's the HTML of the GitHub page rather than the actual asset.
(I wanted to start prototyping a project with FlightSQL but there was some issue with it making it into the v7.0.0 release POMs)
Today I will try to write a script that takes the URL to the nightly Java releases, downloads all the assets, and then creates the proper M2 folder structure for the version number.
I'll publish last night's releases to this repo and share the URL for anyone else who might want a temporary fix until the 7.0.1 or 8.0.0 staging releases 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure:
```json
{
"name": "arrow-download-nightly-as-maven-repo",
"scripts": {
"start": "node index.mjs"
},
"dependencies": {
"cross-fetch": "^3.1.5",
"jsdom": "^19.0.0"
}
}
```
```js
// index.mjs
// Run with: $ node index.mjs
import fetch from "cross-fetch"
import asyncFS from "fs/promises"
import { JSDOM } from "jsdom"
import path from "path"
import { fileURLToPath } from "url"
// Polyfill "__dirname" for Node.js ECMAScript Module filetype
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const ARROW_NIGHTLY_TAG_URL =
"https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-03-0-github-java-jars"
async function main() {
extractArrowNightlyJarsToLocalM2Repo(ARROW_NIGHTLY_TAG_URL)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})
async function extractArrowNightlyJarsToLocalM2Repo(arrowNightlyTagUrl) {
// Parse HTML to DOM
const dom = await JSDOM.fromURL(arrowNightlyTagUrl)
const document = dom.window.document
// Get all <li> tags containing the asset name and download URL
const assetLinkEls = document.querySelectorAll("li.Box-row")
const assets = []
for (const el of assetLinkEls) {
const anchorTag = el.querySelector("a")
const assetFilename = anchorTag.textContent.trim()
const link = anchorTag.href
if (assetFilename.includes("Source code")) continue
const { library, version } = getLibraryAndVersionFromAssetFilename(assetFilename)
if (assets[library]) {
assets[library].push({ version, link, assetFilename })
} else {
assets[library] = [{ version, link, assetFilename }]
}
}
for (const [library, versions] of Object.entries(assets)) {
for (const { version, link, assetFilename } of versions) {
const basePath = "org/apache/arrow"
const libPath = `${library}/${version}`
const fullPath = path.join(__dirname, "../", basePath, libPath)
asyncFS.mkdir(fullPath, { recursive: true })
await downloadUrlAssetToPath(link, path.join(fullPath, assetFilename))
}
}
}
async function downloadUrlAssetToPath(url, filepath) {
const request = await fetch(url)
const content = await request.text()
return asyncFS.writeFile(filepath, content)
}
// M2 repo folder format:
// org/apache/arrow/<lib-name>/<version>/<lib-name>-<version>.(ext)
function getLibraryAndVersionFromAssetFilename(filename) {
const libraryAndVersionRegex = /(?<library>.+)-(?<version>\d\.\d\.\d.dev\d+)/
return filename.match(libraryAndVersionRegex)?.groups
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure:
```json
{
"name": "arrow-download-nightly-as-maven-repo",
"scripts": {
"start": "node index.mjs"
},
"dependencies": {
"cross-fetch": "^3.1.5",
"jsdom": "^19.0.0"
}
}
```
```js
// index.mjs
// Run with: $ node index.mjs
import fetch from "cross-fetch"
import asyncFS from "fs/promises"
import { JSDOM } from "jsdom"
import path from "path"
import { fileURLToPath } from "url"
// Polyfill "__dirname" for Node.js ECMAScript Module filetype
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const ARROW_NIGHTLY_TAG_URL =
"https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-03-0-github-java-jars"
async function main() {
extractArrowNightlyJarsToLocalM2Repo(ARROW_NIGHTLY_TAG_URL)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})
async function extractArrowNightlyJarsToLocalM2Repo(arrowNightlyTagUrl) {
// Parse HTML to DOM
const dom = await JSDOM.fromURL(arrowNightlyTagUrl)
const document = dom.window.document
// Get all <li> tags containing the asset name and download URL
const assetLinkEls = document.querySelectorAll("li.Box-row")
const assets = []
for (const el of assetLinkEls) {
const anchorTag = el.querySelector("a")
const assetFilename = anchorTag.textContent.trim()
const link = anchorTag.href
if (assetFilename.includes("Source code")) continue
const { library, version } = getLibraryAndVersionFromAssetFilename(assetFilename)
if (assets[library]) {
assets[library].push({ version, link, assetFilename })
} else {
assets[library] = [{ version, link, assetFilename }]
}
}
for (const [library, versions] of Object.entries(assets)) {
for (const { version, link, assetFilename } of versions) {
const basePath = "org/apache/arrow"
const libPath = `${library}/${version}`
const fullPath = path.join(__dirname, "../", basePath, libPath)
asyncFS.mkdir(fullPath, { recursive: true })
console.log("Downloading " + assetFilename + " to " + fullPath)
await downloadUrlAssetToPath(link, path.join(fullPath, assetFilename))
}
}
}
async function downloadUrlAssetToPath(url, filepath) {
const request = await fetch(url)
const content = await request.text()
return asyncFS.writeFile(filepath, content)
}
// M2 repo folder format:
// org/apache/arrow/<lib-name>/<version>/<lib-name>-<version>.(ext)
function getLibraryAndVersionFromAssetFilename(filename) {
const libraryAndVersionRegex = /(?<library>.+)-(?<version>\d\.\d\.\d.dev\d+)/
return filename.match(libraryAndVersionRegex)?.groups
}
```
```sh
user@MSI:~/projects/arrow-download-nightly-as-maven-repo$ tree org/
org/
└── apache
└── arrow
├── arrow-algorithm
│ └── 8.0.0.dev165
│ ├── arrow-algorithm-8.0.0.dev165-javadoc.jar
│ ├── arrow-algorithm-8.0.0.dev165-sources.jar
│ ├── arrow-algorithm-8.0.0.dev165-tests.jar
│ ├── arrow-algorithm-8.0.0.dev165.jar
│ └── arrow-algorithm-8.0.0.dev165.pom
├── arrow-avro
│ └── 8.0.0.dev165
│ ├── arrow-avro-8.0.0.dev165-javadoc.jar
│ ├── arrow-avro-8.0.0.dev165-sources.jar
│ ├── arrow-avro-8.0.0.dev165-tests.jar
│ ├── arrow-avro-8.0.0.dev165.jar
│ └── arrow-avro-8.0.0.dev165.pom
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 commented on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 commented on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure:
```js
// index.mjs
// Run with: $ node index.mjs
import fetch from "cross-fetch"
import asyncFS from "fs/promises"
import { JSDOM } from "jsdom"
import path from "path"
import { fileURLToPath } from "url"
// Polyfill "__dirname" for Node.js ECMAScript Module filetype
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const ARROW_NIGHTLY_TAG_URL =
"https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-03-0-github-java-jars"
async function main() {
extractArrowNightlyJarsToLocalM2Repo(ARROW_NIGHTLY_TAG_URL)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})
async function extractArrowNightlyJarsToLocalM2Repo(arrowNightlyTagUrl) {
// Parse HTML to DOM
const dom = await JSDOM.fromURL(arrowNightlyTagUrl)
const document = dom.window.document
// Get all <li> tags containing the asset name and download URL
const assetLinkEls = document.querySelectorAll("li.Box-row")
const assets = []
for (const el of assetLinkEls) {
const anchorTag = el.querySelector("a")
const assetFilename = anchorTag.textContent.trim()
const link = anchorTag.href
if (assetFilename.includes("Source code")) continue
const { library, version } = getLibraryAndVersionFromAssetFilename(assetFilename)
if (assets[library]) {
assets[library].push({ version, link, assetFilename })
} else {
assets[library] = [{ version, link, assetFilename }]
}
}
for (const [library, versions] of Object.entries(assets)) {
for (const { version, link, assetFilename } of versions) {
const basePath = "org/apache/arrow"
const libPath = `${library}/${version}`
const fullPath = path.join(__dirname, "../", basePath, libPath)
asyncFS.mkdir(fullPath, { recursive: true })
await downloadUrlAssetToPath(link, path.join(fullPath, assetFilename))
}
}
}
async function downloadUrlAssetToPath(url, filepath) {
const request = await fetch(url)
const content = await request.text()
return asyncFS.writeFile(filepath, content)
}
// M2 repo folder format:
// org/apache/arrow/<lib-name>/<version>/<lib-name>-<version>.(ext)
function getLibraryAndVersionFromAssetFilename(filename) {
const libraryAndVersionRegex = /(?<library>.+)-(?<version>\d\.\d\.\d.dev\d+)/
return filename.match(libraryAndVersionRegex)?.groups
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure, and the 03/03 jars published as usable M2 repo.
Instructions for use with Gradle/Maven are here:
https://github.com/GavinRay97/arrow-nightlies-repo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059770864
@davisusanibar were you able to get this to work?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] GavinRay97 edited a comment on issue #12570: Arrow nightly Maven releases don't seem to work
Posted by GitBox <gi...@apache.org>.
GavinRay97 edited a comment on issue #12570:
URL: https://github.com/apache/arrow/issues/12570#issuecomment-1059852761
Here is a Node.js script to download from the Nightlies and extract the assets into Maven repository structure:
```json
{
"name": "arrow-download-nightly-as-maven-repo",
"scripts": {
"start": "node index.mjs"
},
"dependencies": {
"cross-fetch": "^3.1.5",
"jsdom": "^19.0.0"
}
}
```
```js
// index.mjs
// Run with: $ node index.mjs
import fetch from "cross-fetch"
import fs from "fs"
import asyncFS from "fs/promises"
import { JSDOM } from "jsdom"
import path from "path"
import { fileURLToPath } from "url"
// Polyfill "__dirname" for Node.js ECMAScript Module filetype
const __dirname = path.dirname(fileURLToPath(import.meta.url))
const ARROW_NIGHTLY_TAG_URL =
"https://github.com/ursacomputing/crossbow/releases/tag/nightly-2022-03-03-0-github-java-jars"
async function main() {
extractArrowNightlyJarsToLocalM2Repo(ARROW_NIGHTLY_TAG_URL)
}
main().catch((err) => {
console.error(err)
process.exit(1)
})
async function extractArrowNightlyJarsToLocalM2Repo(arrowNightlyTagUrl) {
// Parse HTML to DOM
const dom = await JSDOM.fromURL(arrowNightlyTagUrl)
const document = dom.window.document
// Get all <li> tags containing the asset name and download URL
const assetLinkEls = document.querySelectorAll("li.Box-row")
const assets = []
for (const el of assetLinkEls) {
const anchorTag = el.querySelector("a")
const assetFilename = anchorTag.textContent.trim()
const link = anchorTag.href
if (assetFilename.includes("Source code")) continue
const { library, version } = getLibraryAndVersionFromAssetFilename(assetFilename)
if (assets[library]) {
assets[library].push({ version, link, assetFilename })
} else {
assets[library] = [{ version, link, assetFilename }]
}
}
for (const [library, versions] of Object.entries(assets)) {
for (const { version, link, assetFilename } of versions) {
const basePath = "org/apache/arrow"
const libPath = `${library}/${version}`
const fullPath = path.join(__dirname, "../", basePath, libPath)
asyncFS.mkdir(fullPath, { recursive: true })
console.log("Downloading " + assetFilename + " to " + fullPath)
await downloadUrlAssetToPath(link, path.join(fullPath, assetFilename))
}
}
}
async function downloadUrlAssetToPath(url, filepath) {
const request = await fetch(url)
const fileStream = fs.createWriteStream(filepath)
return new Promise((resolve, reject) => {
request.body.pipe(fileStream)
request.body.on("error", reject)
fileStream.on("finish", resolve)
})
}
// M2 repo folder format:
// org/apache/arrow/<lib-name>/<version>/<lib-name>-<version>.(ext)
function getLibraryAndVersionFromAssetFilename(filename) {
const libraryAndVersionRegex = /(?<library>.+)-(?<version>\d\.\d\.\d.dev\d+)/
return filename.match(libraryAndVersionRegex)?.groups
}
```
```sh
user@MSI:~/projects/arrow-download-nightly-as-maven-repo$ tree org/
org/
└── apache
└── arrow
├── arrow-algorithm
│ └── 8.0.0.dev165
│ ├── arrow-algorithm-8.0.0.dev165-javadoc.jar
│ ├── arrow-algorithm-8.0.0.dev165-sources.jar
│ ├── arrow-algorithm-8.0.0.dev165-tests.jar
│ ├── arrow-algorithm-8.0.0.dev165.jar
│ └── arrow-algorithm-8.0.0.dev165.pom
├── arrow-avro
│ └── 8.0.0.dev165
│ ├── arrow-avro-8.0.0.dev165-javadoc.jar
│ ├── arrow-avro-8.0.0.dev165-sources.jar
│ ├── arrow-avro-8.0.0.dev165-tests.jar
│ ├── arrow-avro-8.0.0.dev165.jar
│ └── arrow-avro-8.0.0.dev165.pom
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org