You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Hakan Özler <oz...@gmail.com> on 2023/02/10 16:49:08 UTC

Couldn't restore the backed up config set with the data from AWS S3

Hi All,

Solr 9.1.1 doesn't currently allow me to make a full restore of a backup
where the data and config set are stored on a S3 bucket. The error I have
received each run is "The specified key does not exist". Additionally, the
full message is:

An AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=2C6]
> [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key does
> not exist.]


After investigating the problem further, I have found that the path used to
control whether it's a directory or not in the isDirectory
<https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L323>
method makes the `S3Client.headObject
<https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L328>`
method panic. On line
<https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L324>
324, the
path pointing to a file is transformed into a path leading to a slash. When
a path, for example, is
"path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json",
`sanitizedDirPath` adds a *slash* "/" character to the end of the path as
"path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json
/". Although I'm able to restore the backup if the cluster already has the
config schema definition in the zk, I cannot restore the backed up config
schema files while creating an empty cluster due to this error.

For the sake of this question, here I am describing the other parts;

Backup definition:

>   <backup>
>     <repository name="s3-repo"
> class="org.apache.solr.s3.S3BackupRepository" default="false">
>       <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
>       <str name="s3.region">us-east-2</str>
>     </repository>
>   </backup>


The backup folder structure on S3:

> .
> └── bucket-name
>     └── path1
>         └── path2
>             └── backup-name
>                 └── collection-name
>                     ├── backup_0.properties
>                     ├── index ...

                    ├── shard_backup_metadata
>                     │   └── md_shard1_0.json
>                     └── zk_backup_0
>                         ├── collection_state.json
>                         └── configs
>                             └── config-set-v1
>                                 ├── configoverlay.json
>                                 ├── solrconfig.xml
>                                 ├── stopwords.txt
>                                 └── synonyms.txt


The cURL request I use for restore:

curl -i -X POST \
>    -H "Content-Type:application/json" \
>    -d \
> '{
>   "restore-collection": {
>     "name": "backup-name",
>     "collection": "collection-name-restored",
>     "location": "path1/path2/"
>     "repository": "s3-pro",
>   }
> }' \
>  'http://localhost:8983/api/c'
>

Could you please route me to the right direction regarding the issue?

Thanks!
Hakan

Re: Couldn't restore the backed up config set with the data from AWS S3

Posted by Hakan Özler <oz...@gmail.com>.
I have created the ticket: https://issues.apache.org/jira/browse/SOLR-16670

On Fri, 17 Feb 2023 at 22:08, Hakan Özler <oz...@gmail.com> wrote:

> I have tried this with 9.1.0 as well, but received the same error.
> I also have a similar doubt about what you have addressed.
>
> Could you create a JIRA issue for this?
>
> Sure, I'll do it.
>
> Thanks!
> Hakan
>
> On Fri, 17 Feb 2023 at 20:54, Houston Putman <ho...@apache.org> wrote:
>
>> Did this work in a previous version (9.1.0, 9.0) for you, or have you just
>> started trying it with 9.1.1?
>>
>> We had a change to some ZK/Backup integration in 9.1.1, but I don't think
>> that's the issue here.
>> Instead it looks like the AWS APIs return an error when they previously
>> did
>> not.
>>
>> Overall it looks like the code should be swallowing the error and
>> returning
>> that the directory does not exist.
>>
>> Could you create a JIRA issue for this?
>>
>> - Houston
>>
>> On Fri, Feb 10, 2023 at 12:19 PM Hakan Özler <oz...@gmail.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > Solr 9.1.1 doesn't currently allow me to make a full restore of a backup
>> > where the data and config set are stored on a S3 bucket. The error I
>> have
>> > received each run is "The specified key does not exist". Additionally,
>> the
>> > full message is:
>> >
>> > An AmazonServiceException was thrown! [serviceName=S3]
>> [awsRequestId=2C6]
>> > > [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key
>> does
>> > > not exist.]
>> >
>> >
>> > After investigating the problem further, I have found that the path
>> used to
>> > control whether it's a directory or not in the isDirectory
>> > <
>> >
>> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L323
>> > >
>> > method makes the `S3Client.headObject
>> > <
>> >
>> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L328
>> > >`
>> > method panic. On line
>> > <
>> >
>> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L324
>> > >
>> > 324, the
>> > path pointing to a file is transformed into a path leading to a slash.
>> When
>> > a path, for example, is
>> >
>> >
>> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json",
>> > `sanitizedDirPath` adds a *slash* "/" character to the end of the path
>> as
>> >
>> >
>> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json
>> > /". Although I'm able to restore the backup if the cluster already has
>> the
>> > config schema definition in the zk, I cannot restore the backed up
>> config
>> > schema files while creating an empty cluster due to this error.
>> >
>> > For the sake of this question, here I am describing the other parts;
>> >
>> > Backup definition:
>> >
>> > >   <backup>
>> > >     <repository name="s3-repo"
>> > > class="org.apache.solr.s3.S3BackupRepository" default="false">
>> > >       <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
>> > >       <str name="s3.region">us-east-2</str>
>> > >     </repository>
>> > >   </backup>
>> >
>> >
>> > The backup folder structure on S3:
>> >
>> > > .
>> > > └── bucket-name
>> > >     └── path1
>> > >         └── path2
>> > >             └── backup-name
>> > >                 └── collection-name
>> > >                     ├── backup_0.properties
>> > >                     ├── index ...
>> >
>> >                     ├── shard_backup_metadata
>> > >                     │   └── md_shard1_0.json
>> > >                     └── zk_backup_0
>> > >                         ├── collection_state.json
>> > >                         └── configs
>> > >                             └── config-set-v1
>> > >                                 ├── configoverlay.json
>> > >                                 ├── solrconfig.xml
>> > >                                 ├── stopwords.txt
>> > >                                 └── synonyms.txt
>> >
>> >
>> > The cURL request I use for restore:
>> >
>> > curl -i -X POST \
>> > >    -H "Content-Type:application/json" \
>> > >    -d \
>> > > '{
>> > >   "restore-collection": {
>> > >     "name": "backup-name",
>> > >     "collection": "collection-name-restored",
>> > >     "location": "path1/path2/"
>> > >     "repository": "s3-pro",
>> > >   }
>> > > }' \
>> > >  'http://localhost:8983/api/c'
>> > >
>> >
>> > Could you please route me to the right direction regarding the issue?
>> >
>> > Thanks!
>> > Hakan
>> >
>>
>

Re: Couldn't restore the backed up config set with the data from AWS S3

Posted by Hakan Özler <oz...@gmail.com>.
I have tried this with 9.1.0 as well, but received the same error.
I also have a similar doubt about what you have addressed.

Could you create a JIRA issue for this?

Sure, I'll do it.

Thanks!
Hakan

On Fri, 17 Feb 2023 at 20:54, Houston Putman <ho...@apache.org> wrote:

> Did this work in a previous version (9.1.0, 9.0) for you, or have you just
> started trying it with 9.1.1?
>
> We had a change to some ZK/Backup integration in 9.1.1, but I don't think
> that's the issue here.
> Instead it looks like the AWS APIs return an error when they previously did
> not.
>
> Overall it looks like the code should be swallowing the error and returning
> that the directory does not exist.
>
> Could you create a JIRA issue for this?
>
> - Houston
>
> On Fri, Feb 10, 2023 at 12:19 PM Hakan Özler <oz...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Solr 9.1.1 doesn't currently allow me to make a full restore of a backup
> > where the data and config set are stored on a S3 bucket. The error I have
> > received each run is "The specified key does not exist". Additionally,
> the
> > full message is:
> >
> > An AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=2C6]
> > > [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key
> does
> > > not exist.]
> >
> >
> > After investigating the problem further, I have found that the path used
> to
> > control whether it's a directory or not in the isDirectory
> > <
> >
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L323
> > >
> > method makes the `S3Client.headObject
> > <
> >
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L328
> > >`
> > method panic. On line
> > <
> >
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L324
> > >
> > 324, the
> > path pointing to a file is transformed into a path leading to a slash.
> When
> > a path, for example, is
> >
> >
> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json",
> > `sanitizedDirPath` adds a *slash* "/" character to the end of the path as
> >
> >
> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json
> > /". Although I'm able to restore the backup if the cluster already has
> the
> > config schema definition in the zk, I cannot restore the backed up config
> > schema files while creating an empty cluster due to this error.
> >
> > For the sake of this question, here I am describing the other parts;
> >
> > Backup definition:
> >
> > >   <backup>
> > >     <repository name="s3-repo"
> > > class="org.apache.solr.s3.S3BackupRepository" default="false">
> > >       <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
> > >       <str name="s3.region">us-east-2</str>
> > >     </repository>
> > >   </backup>
> >
> >
> > The backup folder structure on S3:
> >
> > > .
> > > └── bucket-name
> > >     └── path1
> > >         └── path2
> > >             └── backup-name
> > >                 └── collection-name
> > >                     ├── backup_0.properties
> > >                     ├── index ...
> >
> >                     ├── shard_backup_metadata
> > >                     │   └── md_shard1_0.json
> > >                     └── zk_backup_0
> > >                         ├── collection_state.json
> > >                         └── configs
> > >                             └── config-set-v1
> > >                                 ├── configoverlay.json
> > >                                 ├── solrconfig.xml
> > >                                 ├── stopwords.txt
> > >                                 └── synonyms.txt
> >
> >
> > The cURL request I use for restore:
> >
> > curl -i -X POST \
> > >    -H "Content-Type:application/json" \
> > >    -d \
> > > '{
> > >   "restore-collection": {
> > >     "name": "backup-name",
> > >     "collection": "collection-name-restored",
> > >     "location": "path1/path2/"
> > >     "repository": "s3-pro",
> > >   }
> > > }' \
> > >  'http://localhost:8983/api/c'
> > >
> >
> > Could you please route me to the right direction regarding the issue?
> >
> > Thanks!
> > Hakan
> >
>

Re: Couldn't restore the backed up config set with the data from AWS S3

Posted by Houston Putman <ho...@apache.org>.
Did this work in a previous version (9.1.0, 9.0) for you, or have you just
started trying it with 9.1.1?

We had a change to some ZK/Backup integration in 9.1.1, but I don't think
that's the issue here.
Instead it looks like the AWS APIs return an error when they previously did
not.

Overall it looks like the code should be swallowing the error and returning
that the directory does not exist.

Could you create a JIRA issue for this?

- Houston

On Fri, Feb 10, 2023 at 12:19 PM Hakan Özler <oz...@gmail.com> wrote:

> Hi All,
>
> Solr 9.1.1 doesn't currently allow me to make a full restore of a backup
> where the data and config set are stored on a S3 bucket. The error I have
> received each run is "The specified key does not exist". Additionally, the
> full message is:
>
> An AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=2C6]
> > [httpStatus=404] [s3ErrorCode=NoSuchKey] [message=The specified key does
> > not exist.]
>
>
> After investigating the problem further, I have found that the path used to
> control whether it's a directory or not in the isDirectory
> <
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L323
> >
> method makes the `S3Client.headObject
> <
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L328
> >`
> method panic. On line
> <
> https://github.com/apache/solr/blob/branch_9_1/solr/modules/s3-repository/src/java/org/apache/solr/s3/S3StorageClient.java#L324
> >
> 324, the
> path pointing to a file is transformed into a path leading to a slash. When
> a path, for example, is
>
> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json",
> `sanitizedDirPath` adds a *slash* "/" character to the end of the path as
>
> "path1/path2/backup-name/collection-name/zk_backup_0/configs/config-set-v1/configoverlay.json
> /". Although I'm able to restore the backup if the cluster already has the
> config schema definition in the zk, I cannot restore the backed up config
> schema files while creating an empty cluster due to this error.
>
> For the sake of this question, here I am describing the other parts;
>
> Backup definition:
>
> >   <backup>
> >     <repository name="s3-repo"
> > class="org.apache.solr.s3.S3BackupRepository" default="false">
> >       <str name="s3.bucket.name">com.dev.bucket.backup.folder</str>
> >       <str name="s3.region">us-east-2</str>
> >     </repository>
> >   </backup>
>
>
> The backup folder structure on S3:
>
> > .
> > └── bucket-name
> >     └── path1
> >         └── path2
> >             └── backup-name
> >                 └── collection-name
> >                     ├── backup_0.properties
> >                     ├── index ...
>
>                     ├── shard_backup_metadata
> >                     │   └── md_shard1_0.json
> >                     └── zk_backup_0
> >                         ├── collection_state.json
> >                         └── configs
> >                             └── config-set-v1
> >                                 ├── configoverlay.json
> >                                 ├── solrconfig.xml
> >                                 ├── stopwords.txt
> >                                 └── synonyms.txt
>
>
> The cURL request I use for restore:
>
> curl -i -X POST \
> >    -H "Content-Type:application/json" \
> >    -d \
> > '{
> >   "restore-collection": {
> >     "name": "backup-name",
> >     "collection": "collection-name-restored",
> >     "location": "path1/path2/"
> >     "repository": "s3-pro",
> >   }
> > }' \
> >  'http://localhost:8983/api/c'
> >
>
> Could you please route me to the right direction regarding the issue?
>
> Thanks!
> Hakan
>