The reference designs for AI and Generative AI come with both documentation and code and are primarily designed for platform teams. With this, they can provide self-service experiences for application teams/developers with infrastructure required for AI and Generative AI.
The reference designs assume a simple two step process:
The platform team imports the provided environment template(s) into their Rafay Org, configures it with the required credentials for AWS etc and shares it with downstream projects that developers and data scientists can access.
The developer logs in to create an environment based on the published environment template for AI or Generative AI
The image below showcases the high level steps.
sequenceDiagram autonumber participant admin as Platform Team participant rafay as Rafay participant user as Developer rect rgb(191, 223, 255) Note over admin,rafay: Setup Environment Template <br> for AI/Generative AI admin->>admin: Clone Git Repo admin->>rafay: Setup Environment Template admin->>rafay: Provide Credentials <br>(Infrastructure & LLM) end rect rgb(191, 223, 255) Note over rafay,user: Provision <br> AI/Generative AI Environment user->>rafay: Create Environment <br> based on Environment Template user->>rafay: Use Environment user->>rafay: Destroy Environment end
The sample Generative AI applications we currently provide are containerized and the designs/templates we provide are based on "Amazon ECS" and "Amazon EKS" for infrastructure.
Based on Amazon ECS¶
Provisioning the Amazon ECS based environment takes between 5-10 minutes. It makes sense to provide the app developer with a complete self service experience where they can provision single tenant ECS clusters on-demand with the Generative AI application deployed on it as an ECS Task.
Based on Amazon EKS¶
Provisioning an Amazon EKS cluster based environment can take ~30-40 minutes. Kubernetes clusters are extremely well suited for multi tenancy. As a result, we recommend that for every user/developer, the platform engineer provision a Kubernetes namespace and create an IRSA in it. The IRSA will ensure that the Generative AI application deployed by the developer to the namespace will have the required permissions to programmatically access the LLMs on Amazon Bedrock
This reference design is an initial version. We plan to progressively enhance the design with additional functionality based on our roadmap and customer feedback. Please watch this space or our product blogs for updates.