Skip to content


In this section, you will query the models deployed using RayLLM.

Step 1: Port forward the Kubernetes Serve service.

kubectl port-forward service/rayllm-serve-svc 8000:8000 -n kuberay

Step 2: Send a query to amazon/LightGPT.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "amazon/LightGPT",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What are the top 5 most popular programming languages?"}
    "temperature": 0.7


{"id":"amazon/LightGPT-336fc0f9f06af9cd182d7f7d9009427d","object":"text_completion","created":1700030387,"model":"amazon/LightGPT","choices":[{"message":{"role":"assistant","content":"1. Java\n2. C/C++\n3. C#\n4. JavaScript\n5. Python"},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":26,"completion_tokens":24,"total_tokens":50}}


Congratulations! You have successfully deployed the kuberay-operator on your managed Kubernetes cluster as an add-on in a custom cluster blueprint. You then queried the models deployed using RayLLM.