Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Guide v1.5.1-aws-b1.0.2 to v1.7.0-aws-b1.0.1 (latest) #746

Open
sagi-shimoni opened this issue May 24, 2023 · 1 comment
Open

Upgrade Guide v1.5.1-aws-b1.0.2 to v1.7.0-aws-b1.0.1 (latest) #746

sagi-shimoni opened this issue May 24, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@sagi-shimoni
Copy link
Contributor

What is the URL of the document?
https://awslabs.github.io/kubeflow-manifests/docs/deployment/cognito-rds-s3/guide/

Which section(s) is the issue in?
N/A

What needs fixing and describe the solution you'd like?
Recommended upgrade guide , such as this: https://karpenter.sh/v0.27.5/upgrade-guide/

Additional context
We are planning to upgrade from v1.5.1-aws-b1.0.2 to v1.7.0-aws-b1.0.1 (latest)
Currently deployed using Cognito-RDS-S3 on EKS version 1.21

It would be really helpful if there was upgrade guide that summarizes breaking changes or recommended actions to take prior to upgrading versions. (Is blue/green required to confirm compatibility?)

We are not concerned about short downtime, however I'm worried that we will upgrade Eks to 1.23/1.24 and there will be compatibility issues but we will not be able to revert.

Appreciate any input regarding this...

Thanks,
Sagi

@sagi-shimoni sagi-shimoni added the documentation Improvements or additions to documentation label May 24, 2023
@surajkota
Copy link
Contributor

Hi @sagi-shimoni, thanks for creating the feature request. We are aware of this and also have this in our backlog #161. We do not have an ETA currently but will take this into consideration. In the meantime, I would recommend you to look at this blog post for a blue green upgrade process implemented by one of our users - https://aws.amazon.com/blogs/machine-learning/build-repeatable-secure-and-extensible-end-to-end-machine-learning-workflows-using-kubeflow-on-aws/

Note that 1.5 and 1.7 have different EKS version support and hence recommend deploying a new EKS cluster a latest version of Kubeflow and utilize the AWS service capabilities while doing a blue green deployment(some of these demonstrated in the blog above), e.g.

  • You can use the same Cognito userpool since nothing changes on the auth side
  • Backup/Snapshot storage layer using AWS backup or Snapshotting capabilities e.g. RDS, S3, EBS volumes, you can use access points for NFS like EFS and FSx
  • For backing up and restoring Kubernetes resources(e.g. notebooks, PodDefaults, Inference services), you can look into using Velero. It also helps with PVs
  • If you are using SageMaker resources, you can take adopt resources into new cluster by using the adopted resource feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants