You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hop.apache.org by David Hughes <dh...@octavebio.com> on 2022/01/26 17:56:53 UTC

AWS S3 Integration

I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv by
choosing file/open and entering s3:// and refreshing. I get a file not
found error indicating the HOP is looking in my local file system. Has
anyone been able to get S3 file reading configured and working properly? I
am appreciative of any insight you can provide.

-- 
David Hughes

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
Well actually you can copy the aws libraries from 1.1.0
in /opt/hop/plugins/tech/aws/lib over to the 1.0.0 variant.

Cheers,
Matt

On Wed, Jan 26, 2022 at 9:13 PM Bart Maertens <ba...@know.bi> wrote:

> Hi David,
>
> Hop 1.0 was released about 4 months ago. There's no way we can modify
> previous releases.
>
> Upgrading to 1.1.0 is quick, will fix your S3 problem, and will give you a
> lot of bug fixes and new functionality for Neo4j and in general.
>
> Regards,
> Bart
>
> On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com>
> wrote:
>
>> Hi Matt,
>>
>> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
>> (congratulations btw). I followed the docs and receive the error message
>> that I described.
>>
>> Error browsing to location:
>> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
>> FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>> Root cause: FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>>
>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>> loading and even GDS processing. So far I have build a knowledgegraph and
>> ontology via hop using local files but want to schedule/automate the
>> process from S3. After I get that working I will move on to considering how
>> best to write unittest post Neo4j loading. I saw the unittest feature but
>> do not think it will meet my use case where I want to run a cypher query
>> checking for orphaned nodes for example and assert that the count is 0.
>>
>> Thank you for your insights on how to get S3 reading working in v1.0.0
>>
>> Regards,
>>
>> David
>>
>> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>>> a packaging bug.
>>> But a little bird told me that there's a newer version online at
>>> https://hop.apache.org/download/
>>> So if you could try that one you'll probably be more successful.
>>>
>>> If you're on 1.1.0 already then the docs are at:
>>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>>> Maybe those can help.
>>>
>>> Good luck!
>>>
>>> Matt
>>>
>>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>>> wrote:
>>>
>>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a
>>>> csv by choosing file/open and entering s3:// and refreshing. I get a file
>>>> not found error indicating the HOP is looking in my local file system. Has
>>>> anyone been able to get S3 file reading configured and working properly? I
>>>> am appreciative of any insight you can provide.
>>>>
>>>> --
>>>> David Hughes
>>>>
>>>
>>>
>>> --
>>> Neo4j Chief Solutions Architect
>>> *✉   *matt.casters@neo4j.com
>>>
>>>
>>>
>>>
>>
>> --
>> David Hughes
>> Platform Architect
>> Octave Bioscience
>> www.octavebio.com
>>
>>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
Well actually you can copy the aws libraries from 1.1.0
in /opt/hop/plugins/tech/aws/lib over to the 1.0.0 variant.

Cheers,
Matt

On Wed, Jan 26, 2022 at 9:13 PM Bart Maertens <ba...@know.bi> wrote:

> Hi David,
>
> Hop 1.0 was released about 4 months ago. There's no way we can modify
> previous releases.
>
> Upgrading to 1.1.0 is quick, will fix your S3 problem, and will give you a
> lot of bug fixes and new functionality for Neo4j and in general.
>
> Regards,
> Bart
>
> On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com>
> wrote:
>
>> Hi Matt,
>>
>> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
>> (congratulations btw). I followed the docs and receive the error message
>> that I described.
>>
>> Error browsing to location:
>> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
>> FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>> Root cause: FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>>
>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>> loading and even GDS processing. So far I have build a knowledgegraph and
>> ontology via hop using local files but want to schedule/automate the
>> process from S3. After I get that working I will move on to considering how
>> best to write unittest post Neo4j loading. I saw the unittest feature but
>> do not think it will meet my use case where I want to run a cypher query
>> checking for orphaned nodes for example and assert that the count is 0.
>>
>> Thank you for your insights on how to get S3 reading working in v1.0.0
>>
>> Regards,
>>
>> David
>>
>> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>>> a packaging bug.
>>> But a little bird told me that there's a newer version online at
>>> https://hop.apache.org/download/
>>> So if you could try that one you'll probably be more successful.
>>>
>>> If you're on 1.1.0 already then the docs are at:
>>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>>> Maybe those can help.
>>>
>>> Good luck!
>>>
>>> Matt
>>>
>>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>>> wrote:
>>>
>>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a
>>>> csv by choosing file/open and entering s3:// and refreshing. I get a file
>>>> not found error indicating the HOP is looking in my local file system. Has
>>>> anyone been able to get S3 file reading configured and working properly? I
>>>> am appreciative of any insight you can provide.
>>>>
>>>> --
>>>> David Hughes
>>>>
>>>
>>>
>>> --
>>> Neo4j Chief Solutions Architect
>>> *✉   *matt.casters@neo4j.com
>>>
>>>
>>>
>>>
>>
>> --
>> David Hughes
>> Platform Architect
>> Octave Bioscience
>> www.octavebio.com
>>
>>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com

Re: AWS S3 Integration

Posted by David Hughes <dh...@octavebio.com>.
Hi Bart,

I’m upgrading now, I missed the semver minor of 1 in Matt’s email…I was so
focused on the recent news of 1.0 :-)


On Wed, Jan 26, 2022 at 12:13 Bart Maertens <ba...@know.bi> wrote:

> Hi David,
>
> Hop 1.0 was released about 4 months ago. There's no way we can modify
> previous releases.
>
> Upgrading to 1.1.0 is quick, will fix your S3 problem, and will give you a
> lot of bug fixes and new functionality for Neo4j and in general.
>
> Regards,
> Bart
>
> On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com>
> wrote:
>
>> Hi Matt,
>>
>> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
>> (congratulations btw). I followed the docs and receive the error message
>> that I described.
>>
>> Error browsing to location:
>> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
>> FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>> Root cause: FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>>
>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>> loading and even GDS processing. So far I have build a knowledgegraph and
>> ontology via hop using local files but want to schedule/automate the
>> process from S3. After I get that working I will move on to considering how
>> best to write unittest post Neo4j loading. I saw the unittest feature but
>> do not think it will meet my use case where I want to run a cypher query
>> checking for orphaned nodes for example and assert that the count is 0.
>>
>> Thank you for your insights on how to get S3 reading working in v1.0.0
>>
>> Regards,
>>
>> David
>>
>> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>>> a packaging bug.
>>> But a little bird told me that there's a newer version online at
>>> https://hop.apache.org/download/
>>> So if you could try that one you'll probably be more successful.
>>>
>>> If you're on 1.1.0 already then the docs are at:
>>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>>> Maybe those can help.
>>>
>>> Good luck!
>>>
>>> Matt
>>>
>>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>>> wrote:
>>>
>>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a
>>>> csv by choosing file/open and entering s3:// and refreshing. I get a file
>>>> not found error indicating the HOP is looking in my local file system. Has
>>>> anyone been able to get S3 file reading configured and working properly? I
>>>> am appreciative of any insight you can provide.
>>>>
>>>> --
>>>> David Hughes
>>>>
>>>
>>>
>>> --
>>> Neo4j Chief Solutions Architect
>>> *✉   *matt.casters@neo4j.com
>>>
>>>
>>>
>>>
>>
>> --
>> David Hughes
>> Platform Architect
>> Octave Bioscience
>> www.octavebio.com
>>
>> --
David Hughes
Principal ML/Graph Data Engineer
Octave Bioscience
www.octavebio.com

Re: AWS S3 Integration

Posted by Bart Maertens <ba...@know.bi>.
Hi David,

Hop 1.0 was released about 4 months ago. There's no way we can modify
previous releases.

Upgrading to 1.1.0 is quick, will fix your S3 problem, and will give you a
lot of bug fixes and new functionality for Neo4j and in general.

Regards,
Bart

On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com> wrote:

> Hi Matt,
>
> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
> (congratulations btw). I followed the docs and receive the error message
> that I described.
>
> Error browsing to location:
> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
> FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
> Root cause: FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
>
> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
> loading and even GDS processing. So far I have build a knowledgegraph and
> ontology via hop using local files but want to schedule/automate the
> process from S3. After I get that working I will move on to considering how
> best to write unittest post Neo4j loading. I saw the unittest feature but
> do not think it will meet my use case where I want to run a cypher query
> checking for orphaned nodes for example and assert that the count is 0.
>
> Thank you for your insights on how to get S3 reading working in v1.0.0
>
> Regards,
>
> David
>
> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
> wrote:
>
>> Hi David,
>>
>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>> a packaging bug.
>> But a little bird told me that there's a newer version online at
>> https://hop.apache.org/download/
>> So if you could try that one you'll probably be more successful.
>>
>> If you're on 1.1.0 already then the docs are at:
>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>> Maybe those can help.
>>
>> Good luck!
>>
>> Matt
>>
>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>> wrote:
>>
>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
>>> by choosing file/open and entering s3:// and refreshing. I get a file not
>>> found error indicating the HOP is looking in my local file system. Has
>>> anyone been able to get S3 file reading configured and working properly? I
>>> am appreciative of any insight you can provide.
>>>
>>> --
>>> David Hughes
>>>
>>
>>
>> --
>> Neo4j Chief Solutions Architect
>> *✉   *matt.casters@neo4j.com
>>
>>
>>
>>
>
> --
> David Hughes
> Platform Architect
> Octave Bioscience
> www.octavebio.com
>
>

Re: AWS S3 Integration

Posted by Hans Van Akelyen <ha...@gmail.com>.
Hi David,

One thing we do not have yet is a listing of the buckets so the lowest
level you can go is s3://bucket-name and that should work.

[image: Screenshot 2022-01-26 at 22.40.52.png]

Kind regards,
Hans



On Wed, 26 Jan 2022 at 21:53, David Hughes <dh...@octavebio.com> wrote:

> Hi Matt,
>
> Thank you for the updated information on performance testing, I give that
> a try! I am now on v1.1.0 and have attached a screen shot of a File/Open
> operation from the GUI that neither performs a listing of S3 or errors. I
> have a credentials file in my ~/.aws/ which has several profiles that have
> access to S3. Is there a was to configure which profile Hop should use?
> Thank you for your help in getting S3 connections working.
>
> Regards,
>
> David
>
> On Wed, Jan 26, 2022 at 12:42 PM Matt Casters <ma...@neo4j.com>
> wrote:
>
>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>>> loading and even GDS processing. So far I have build a knowledgegraph and
>>> ontology via hop using local files but want to schedule/automate the
>>> process from S3. After I get that working I will move on to considering how
>>> best to write unittest post Neo4j loading. I saw the unittest feature but
>>> do not think it will meet my use case where I want to run a cypher query
>>> checking for orphaned nodes for example and assert that the count is 0.
>>
>>
>> First of all, there have been a number of improvements to the Neo4j
>> plugins in 1.1.0, in particular to the Neo4j Graph Output transform.
>> Second, we run integration tests against a Neo4j docker container every
>> night with unit tests.
>>
>>
>> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/neo4j/
>>
>> The workflows and pipelines for that are located here:
>> https://github.com/apache/hop/tree/master/integration-tests/neo4j
>>
>> So in your case you would either run the count in Neo4j or in Hop and
>> compare to a golden record with 0 in it.  Or you can pass any output to an
>> Abort transform... There are many ways to test these things.
>>
>> Cheers,
>> Matt
>>
>> On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com>
>> wrote:
>>
>>> Hi Matt,
>>>
>>> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
>>> (congratulations btw). I followed the docs and receive the error message
>>> that I described.
>>>
>>> Error browsing to location:
>>> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
>>> FileNotFolderException: Could not list the contents of
>>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>>> because it is not a folder.
>>> Root cause: FileNotFolderException: Could not list the contents of
>>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>>> because it is not a folder.
>>>
>>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>>> loading and even GDS processing. So far I have build a knowledgegraph and
>>> ontology via hop using local files but want to schedule/automate the
>>> process from S3. After I get that working I will move on to considering how
>>> best to write unittest post Neo4j loading. I saw the unittest feature but
>>> do not think it will meet my use case where I want to run a cypher query
>>> checking for orphaned nodes for example and assert that the count is 0.
>>>
>>> Thank you for your insights on how to get S3 reading working in v1.0.0
>>>
>>> Regards,
>>>
>>> David
>>>
>>> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>>>> a packaging bug.
>>>> But a little bird told me that there's a newer version online at
>>>> https://hop.apache.org/download/
>>>> So if you could try that one you'll probably be more successful.
>>>>
>>>> If you're on 1.1.0 already then the docs are at:
>>>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>>>> Maybe those can help.
>>>>
>>>> Good luck!
>>>>
>>>> Matt
>>>>
>>>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>>>> wrote:
>>>>
>>>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a
>>>>> csv by choosing file/open and entering s3:// and refreshing. I get a file
>>>>> not found error indicating the HOP is looking in my local file system. Has
>>>>> anyone been able to get S3 file reading configured and working properly? I
>>>>> am appreciative of any insight you can provide.
>>>>>
>>>>> --
>>>>> David Hughes
>>>>>
>>>>
>>>>
>>>> --
>>>> Neo4j Chief Solutions Architect
>>>> *✉   *matt.casters@neo4j.com
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> David Hughes
>>> Platform Architect
>>> Octave Bioscience
>>> www.octavebio.com
>>>
>>>
>>
>> --
>> Neo4j Chief Solutions Architect
>> *✉   *matt.casters@neo4j.com
>>
>>
>>
>>
>
> --
> David Hughes
> Platform Architect
> Octave Bioscience
> www.octavebio.com
>
>

Re: AWS S3 Integration

Posted by David Hughes <dh...@octavebio.com>.
Hi Matt,

Thank you for the updated information on performance testing, I give that a
try! I am now on v1.1.0 and have attached a screen shot of a File/Open
operation from the GUI that neither performs a listing of S3 or errors. I
have a credentials file in my ~/.aws/ which has several profiles that have
access to S3. Is there a was to configure which profile Hop should use?
Thank you for your help in getting S3 connections working.

Regards,

David

On Wed, Jan 26, 2022 at 12:42 PM Matt Casters <ma...@neo4j.com>
wrote:

> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>> loading and even GDS processing. So far I have build a knowledgegraph and
>> ontology via hop using local files but want to schedule/automate the
>> process from S3. After I get that working I will move on to considering how
>> best to write unittest post Neo4j loading. I saw the unittest feature but
>> do not think it will meet my use case where I want to run a cypher query
>> checking for orphaned nodes for example and assert that the count is 0.
>
>
> First of all, there have been a number of improvements to the Neo4j
> plugins in 1.1.0, in particular to the Neo4j Graph Output transform.
> Second, we run integration tests against a Neo4j docker container every
> night with unit tests.
>
>
> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/neo4j/
>
> The workflows and pipelines for that are located here:
> https://github.com/apache/hop/tree/master/integration-tests/neo4j
>
> So in your case you would either run the count in Neo4j or in Hop and
> compare to a golden record with 0 in it.  Or you can pass any output to an
> Abort transform... There are many ways to test these things.
>
> Cheers,
> Matt
>
> On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com>
> wrote:
>
>> Hi Matt,
>>
>> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
>> (congratulations btw). I followed the docs and receive the error message
>> that I described.
>>
>> Error browsing to location:
>> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
>> FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>> Root cause: FileNotFolderException: Could not list the contents of
>> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
>> because it is not a folder.
>>
>> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
>> loading and even GDS processing. So far I have build a knowledgegraph and
>> ontology via hop using local files but want to schedule/automate the
>> process from S3. After I get that working I will move on to considering how
>> best to write unittest post Neo4j loading. I saw the unittest feature but
>> do not think it will meet my use case where I want to run a cypher query
>> checking for orphaned nodes for example and assert that the count is 0.
>>
>> Thank you for your insights on how to get S3 reading working in v1.0.0
>>
>> Regards,
>>
>> David
>>
>> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>>> a packaging bug.
>>> But a little bird told me that there's a newer version online at
>>> https://hop.apache.org/download/
>>> So if you could try that one you'll probably be more successful.
>>>
>>> If you're on 1.1.0 already then the docs are at:
>>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>>> Maybe those can help.
>>>
>>> Good luck!
>>>
>>> Matt
>>>
>>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>>> wrote:
>>>
>>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a
>>>> csv by choosing file/open and entering s3:// and refreshing. I get a file
>>>> not found error indicating the HOP is looking in my local file system. Has
>>>> anyone been able to get S3 file reading configured and working properly? I
>>>> am appreciative of any insight you can provide.
>>>>
>>>> --
>>>> David Hughes
>>>>
>>>
>>>
>>> --
>>> Neo4j Chief Solutions Architect
>>> *✉   *matt.casters@neo4j.com
>>>
>>>
>>>
>>>
>>
>> --
>> David Hughes
>> Platform Architect
>> Octave Bioscience
>> www.octavebio.com
>>
>>
>
> --
> Neo4j Chief Solutions Architect
> *✉   *matt.casters@neo4j.com
>
>
>
>

-- 
David Hughes
Platform Architect
Octave Bioscience
www.octavebio.com

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
>
> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
> loading and even GDS processing. So far I have build a knowledgegraph and
> ontology via hop using local files but want to schedule/automate the
> process from S3. After I get that working I will move on to considering how
> best to write unittest post Neo4j loading. I saw the unittest feature but
> do not think it will meet my use case where I want to run a cypher query
> checking for orphaned nodes for example and assert that the count is 0.


First of all, there have been a number of improvements to the Neo4j plugins
in 1.1.0, in particular to the Neo4j Graph Output transform.
Second, we run integration tests against a Neo4j docker container every
night with unit tests.

https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/neo4j/

The workflows and pipelines for that are located here:
https://github.com/apache/hop/tree/master/integration-tests/neo4j

So in your case you would either run the count in Neo4j or in Hop and
compare to a golden record with 0 in it.  Or you can pass any output to an
Abort transform... There are many ways to test these things.

Cheers,
Matt

On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com> wrote:

> Hi Matt,
>
> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
> (congratulations btw). I followed the docs and receive the error message
> that I described.
>
> Error browsing to location:
> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
> FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
> Root cause: FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
>
> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
> loading and even GDS processing. So far I have build a knowledgegraph and
> ontology via hop using local files but want to schedule/automate the
> process from S3. After I get that working I will move on to considering how
> best to write unittest post Neo4j loading. I saw the unittest feature but
> do not think it will meet my use case where I want to run a cypher query
> checking for orphaned nodes for example and assert that the count is 0.
>
> Thank you for your insights on how to get S3 reading working in v1.0.0
>
> Regards,
>
> David
>
> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
> wrote:
>
>> Hi David,
>>
>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>> a packaging bug.
>> But a little bird told me that there's a newer version online at
>> https://hop.apache.org/download/
>> So if you could try that one you'll probably be more successful.
>>
>> If you're on 1.1.0 already then the docs are at:
>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>> Maybe those can help.
>>
>> Good luck!
>>
>> Matt
>>
>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>> wrote:
>>
>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
>>> by choosing file/open and entering s3:// and refreshing. I get a file not
>>> found error indicating the HOP is looking in my local file system. Has
>>> anyone been able to get S3 file reading configured and working properly? I
>>> am appreciative of any insight you can provide.
>>>
>>> --
>>> David Hughes
>>>
>>
>>
>> --
>> Neo4j Chief Solutions Architect
>> *✉   *matt.casters@neo4j.com
>>
>>
>>
>>
>
> --
> David Hughes
> Platform Architect
> Octave Bioscience
> www.octavebio.com
>
>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
>
> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
> loading and even GDS processing. So far I have build a knowledgegraph and
> ontology via hop using local files but want to schedule/automate the
> process from S3. After I get that working I will move on to considering how
> best to write unittest post Neo4j loading. I saw the unittest feature but
> do not think it will meet my use case where I want to run a cypher query
> checking for orphaned nodes for example and assert that the count is 0.


First of all, there have been a number of improvements to the Neo4j plugins
in 1.1.0, in particular to the Neo4j Graph Output transform.
Second, we run integration tests against a Neo4j docker container every
night with unit tests.

https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/neo4j/

The workflows and pipelines for that are located here:
https://github.com/apache/hop/tree/master/integration-tests/neo4j

So in your case you would either run the count in Neo4j or in Hop and
compare to a golden record with 0 in it.  Or you can pass any output to an
Abort transform... There are many ways to test these things.

Cheers,
Matt

On Wed, Jan 26, 2022 at 8:12 PM David Hughes <dh...@octavebio.com> wrote:

> Hi Matt,
>
> Wow, thank you for responding so quickly, and in person! I am on v1.0.0
> (congratulations btw). I followed the docs and receive the error message
> that I described.
>
> Error browsing to location:
> 's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
> FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
> Root cause: FileNotFolderException: Could not list the contents of
> "file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
> because it is not a folder.
>
> I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
> loading and even GDS processing. So far I have build a knowledgegraph and
> ontology via hop using local files but want to schedule/automate the
> process from S3. After I get that working I will move on to considering how
> best to write unittest post Neo4j loading. I saw the unittest feature but
> do not think it will meet my use case where I want to run a cypher query
> checking for orphaned nodes for example and assert that the count is 0.
>
> Thank you for your insights on how to get S3 reading working in v1.0.0
>
> Regards,
>
> David
>
> On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
> wrote:
>
>> Hi David,
>>
>> Unfortunately version 1.0.0 had a missing AWS library.  It was
>> a packaging bug.
>> But a little bird told me that there's a newer version online at
>> https://hop.apache.org/download/
>> So if you could try that one you'll probably be more successful.
>>
>> If you're on 1.1.0 already then the docs are at:
>> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
>> Maybe those can help.
>>
>> Good luck!
>>
>> Matt
>>
>> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
>> wrote:
>>
>>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
>>> by choosing file/open and entering s3:// and refreshing. I get a file not
>>> found error indicating the HOP is looking in my local file system. Has
>>> anyone been able to get S3 file reading configured and working properly? I
>>> am appreciative of any insight you can provide.
>>>
>>> --
>>> David Hughes
>>>
>>
>>
>> --
>> Neo4j Chief Solutions Architect
>> *✉   *matt.casters@neo4j.com
>>
>>
>>
>>
>
> --
> David Hughes
> Platform Architect
> Octave Bioscience
> www.octavebio.com
>
>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com

Re: AWS S3 Integration

Posted by David Hughes <dh...@octavebio.com>.
Hi Matt,

Wow, thank you for responding so quickly, and in person! I am on v1.0.0
(congratulations btw). I followed the docs and receive the error message
that I described.

Error browsing to location:
's3://octave-domo-data/patientgraph/reference/ccs_dx_icd10cm_2019_1.csv'
FileNotFolderException: Could not list the contents of
"file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
because it is not a folder.
Root cause: FileNotFolderException: Could not list the contents of
"file:///Users/davidhughes/servers/hop/s3:/octave-domo-data/patientgraph/reference"
because it is not a folder.

I am excited to be using HOP. My intent is to use HOP to ETL my Neo4j
loading and even GDS processing. So far I have build a knowledgegraph and
ontology via hop using local files but want to schedule/automate the
process from S3. After I get that working I will move on to considering how
best to write unittest post Neo4j loading. I saw the unittest feature but
do not think it will meet my use case where I want to run a cypher query
checking for orphaned nodes for example and assert that the count is 0.

Thank you for your insights on how to get S3 reading working in v1.0.0

Regards,

David

On Wed, Jan 26, 2022 at 11:02 AM Matt Casters <ma...@neo4j.com>
wrote:

> Hi David,
>
> Unfortunately version 1.0.0 had a missing AWS library.  It was
> a packaging bug.
> But a little bird told me that there's a newer version online at
> https://hop.apache.org/download/
> So if you could try that one you'll probably be more successful.
>
> If you're on 1.1.0 already then the docs are at:
> https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
> Maybe those can help.
>
> Good luck!
>
> Matt
>
> On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com>
> wrote:
>
>> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
>> by choosing file/open and entering s3:// and refreshing. I get a file not
>> found error indicating the HOP is looking in my local file system. Has
>> anyone been able to get S3 file reading configured and working properly? I
>> am appreciative of any insight you can provide.
>>
>> --
>> David Hughes
>>
>
>
> --
> Neo4j Chief Solutions Architect
> *✉   *matt.casters@neo4j.com
>
>
>
>

-- 
David Hughes
Platform Architect
Octave Bioscience
www.octavebio.com

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
Hi David,

Unfortunately version 1.0.0 had a missing AWS library.  It was
a packaging bug.
But a little bird told me that there's a newer version online at
https://hop.apache.org/download/
So if you could try that one you'll probably be more successful.

If you're on 1.1.0 already then the docs are at:
https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
Maybe those can help.

Good luck!

Matt

On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com> wrote:

> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
> by choosing file/open and entering s3:// and refreshing. I get a file not
> found error indicating the HOP is looking in my local file system. Has
> anyone been able to get S3 file reading configured and working properly? I
> am appreciative of any insight you can provide.
>
> --
> David Hughes
>


-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com

Re: AWS S3 Integration

Posted by Matt Casters <ma...@neo4j.com>.
Hi David,

Unfortunately version 1.0.0 had a missing AWS library.  It was
a packaging bug.
But a little bird told me that there's a newer version online at
https://hop.apache.org/download/
So if you could try that one you'll probably be more successful.

If you're on 1.1.0 already then the docs are at:
https://hop.apache.org/manual/latest/vfs/aws-s3-vfs.html
Maybe those can help.

Good luck!

Matt

On Wed, Jan 26, 2022 at 6:57 PM David Hughes <dh...@octavebio.com> wrote:

> I have AWS IAM credentials in ~/.aws on my mac and tried to access a csv
> by choosing file/open and entering s3:// and refreshing. I get a file not
> found error indicating the HOP is looking in my local file system. Has
> anyone been able to get S3 file reading configured and working properly? I
> am appreciative of any insight you can provide.
>
> --
> David Hughes
>


-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com