Skip to content

Overview

The reference designs for AI and Generative AI come with both documentation and code and are primarily designed for platform teams. With this, they can provide self-service experiences for application teams/developers with infrastructure required for AI and Generative AI.

The reference designs assume a simple two step process:

Step 1

The platform team imports the provided environment template(s) into their Rafay Org, configures it with the required credentials for AWS etc and shares it with downstream projects that developers and data scientists can access.

Step 2

The developer logs in to create an environment based on the published environment template for AI or Generative AI

The image below showcases the high level steps.

sequenceDiagram
    autonumber
    participant admin as Platform Team
    participant rafay as Rafay
    participant user as Developer


    rect rgb(191, 223, 255)
    Note over admin,rafay: Setup Environment Template <br> for AI/Generative AI
    admin->>admin: Clone Git Repo
    admin->>rafay: Setup Environment Template 
    admin->>rafay: Provide Credentials <br>(Infrastructure & LLM)
    end

    rect rgb(191, 223, 255)
    Note over rafay,user: Provision <br> AI/Generative AI Environment
    user->>rafay: Create Environment <br> based on Environment Template 
    user->>rafay: Use Environment
    user->>rafay: Destroy Environment
    end

Infrastructure Options

The sample Generative AI applications we currently provide are containerized and the designs/templates we provide are based on "Amazon ECS" and "Amazon EKS" for infrastructure.

Based on Amazon ECS

Provisioning the Amazon ECS based environment takes between 5-10 minutes. It makes sense to provide the app developer with a complete self service experience where they can provision single tenant ECS clusters on-demand with the Generative AI application deployed on it as an ECS Task.

Based on Amazon EKS

Provisioning an Amazon EKS cluster based environment can take ~30-40 minutes. Kubernetes clusters are extremely well suited for multi tenancy. As a result, we recommend that for every user/developer, the platform engineer provision a Kubernetes namespace and create an IRSA in it. The IRSA will ensure that the Generative AI application deployed by the developer to the namespace will have the required permissions to programmatically access the LLMs on Amazon Bedrock


Roadmap

This reference design is an initial version. We plan to progressively enhance the design with additional functionality based on our roadmap and customer feedback. Please watch this space or our product blogs for updates.