This deployment uses the Validated Patterns framework, taking advantage of GitOps for seamless provisioning of all operators and applications. It deploys a Chatbot application that harnesses the power of Large Language Models (LLMs) combined with the Retrieval-Augmented Generation (RAG) framework.
The pattern uses Red Hat OpenShift AI to deploy and serve LLM models at scale.
By default, this pattern uses pgvector as the RAG DB backend. EDB Postgres, Redis, Elasticsearch, and Microsoft SQL Server (either a local deployment as part of the pattern or an existing SQL Server DB on Azure) are also options for RAG DB backends.
This pattern populates your chosen RAG DB with documents relating to Red Hat OpenShift AI for the purpose of generating project proposals.
-
Podman
-
Red Hat Openshift cluster running in AWS. Supported regions are : us-east-1 us-east-2 us-west-1 us-west-2 ca-central-1 sa-east-1 eu-west-1 eu-west-2 eu-west-3 eu-central-1 eu-north-1 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-southeast-1 ap-southeast-2 ap-south-1.
-
Create a fork of the rag-llm-gitops Git repository.
-
EDB Postgres Operator Credentials (Required only if you select EDB): The EDB Postgres for Kubernetes operator from the certified-operators catalog requires authentication to pull images from
docker.enterprisedb.com. You will need to:- Register for a free trial account at EDB Registration
- Obtain your subscription token from EDB Repos Downloads
- Add the token to your
values-secret.yamlfile during configuration (see below)
For more details, see the EDB Installation Documentation.
The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift AI. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
- Leveraging Red Hat OpenShift AI to deploy and serve LLM models powered by NVIDIA GPU accelerator.
- LLM Application augmented with content from Red Hat product documentation.
- Multiple LLM providers (OpenAI, Hugging Face, NVIDIA).
- Vector Database, such as EDB Postgres for Kubernetes or Redis, to store embeddings of Red Hat product documentation.
- Monitoring dashboard to provide key metrics such as ratings.
- GitOps setup to deploy end-to-end demo (frontend / vector database / served models).
Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift
Figure 2. Logical diagram of the RAG Demo with Red Hat OpenShift.
Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift.
Figure 4. Schematic diagram for Ingestion of data for RAG.
Figure 5. Schematic diagram for RAG demo augmented query.
In Figure 5, we can see RAG augmented query. IBM Granite 3.1-8B-Instruct model is used for language processing, LangChain to integrate different tools of the LLM-based application together and to process the PDF files and web pages, vector database provider such as EDB Postgres for Kubernetes or Redis, is used to store vectors, and Red Hat OpenShift AI to serve the IBM Granite 3.1-8B-Instruct model, Gradio is used for user interface and object storage to store language model and other datasets. Solution components are deployed as microservices in the Red Hat OpenShift cluster.
View and download all of the diagrams above in our open source tooling site.
Figure 6. Proposed demo architecture with OpenShift AI
- vLLM Text Generation Inference Server: The pattern deploys a vLLM Inference Server. The server deploys and serves
ibm-granite/granite-3.1-8b-instructmodel. The server will require a GPU node. - EDB Postgres for Kubernetes / Redis Server: A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
- Populate VectorDb Job: The job creates the embeddings and populates the vector database.
- LLM Application: This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
- Prometheus: Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
- Grafana: Deploys Grafana application to visualize the metrics.
To run the demo, ensure the Podman is running on your machine.Fork the rag-llm-gitops repository into your organization
Replace the token and the API server URL in the command below to login to the OpenShift cluster.
oc login --token=<token> --server=<api_server_url> # login to Openshift clustergit clone https://github.com/<<your-username>>/rag-llm-gitops.git
cd rag-llm-gitopsThis pattern deploys IBM Granite 3.3-8B-Instruct out of box. Run the following command to configure vault with the model ID.
# Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml.
# You should never check-in these files
# Add secrets to the values-secret.yaml that needs to be added to the vault.
cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yamlTo deploy a model that requires a Hugging Face token, grab the Hugging Face token and accept the terms and conditions on the model page. Update the hftoken secret in
~/values-secret-rag-llm-gitops.yaml and edit the value of .global.model.vllm in
values-global.yaml to your desired model.
IMPORTANT: If you are using EDB Postgres for Kubernetes, you must add your EDB subscription token to
~/values-secret-rag-llm-gitops.yaml:
secrets:
- name: hfmodel
fields:
- name: hftoken
value: null
- name: edb
fields:
- name: token
value: "YOUR_EDB_TOKEN_HERE" # Replace with your EDB subscription token
description: EDB subscription token for pulling certified operator imagesThe EDB token is synced into Vault and then used by External Secrets to create the required pull secret (postgresql-operator-pull-secret) in openshift-operators. Without this token, the EDB operator will fail to pull its container image and the database will not be created.
If you are using PGVector or SQL Server, you can update the password in this file. Otherwise, an autogenerated password is used.
As a pre-requisite to deploy the application using this Validated Pattern, a GPU node needs to be provisioned. To provision the GPU node on AWS:
./pattern.sh make create-gpu-machinesetWait till the node is provisioned and running.
Alternatiely, follow the instructions to manually install the GPU node.
*Note:: This pattern supports five types of vector databases: pgvector, EDB Postgres for Kubernetes, Elasticsearch, Redis, and SQL Server. By default the pattern will deploy pgvector as the RAG DB. To deploy EDB, set global.db.type to EDB in values-global.yaml.
---
global:
pattern: rag-llm-gitops
options:
useCSV: false
syncPolicy: Automatic
installPlanApproval: Automatic
# Possible values for RAG vector DB db.type:
# REDIS -> Redis (Local chart deploy)
# EDB -> PGVector via EDB operator (Local chart deploy)
# PGVECTOR -> PGVector (Local Postgres chart deploy)
# ELASTIC -> Elasticsearch (Local chart deploy)
# MSSQL -> MS SQL Server (Local chart deploy)
# AZURESQL -> Azure SQL (Pre-existing in Azure)
db:
index: docs
type: PGVECTOR
# Models used by the inference service (should be a HuggingFace model ID)
model:
vllm: ibm-granite/granite-3.3-8b-instruct
embedding: sentence-transformers/all-mpnet-base-v2
storageClass: gp3-csi
main:
clusterGroupName: hub
multiSourceConfig:
enabled: true
clusterGroupChartVersion: 0.9.*Following commands will take about 15-20 minutes
Validated pattern will be deployed
./pattern.sh make install- Login to the OpenShift web console.
- Navigate to the Workloads --> Pods.
- Select the
rag-llmproject from the drop down. - Following pods should be up and running.
Note: If the hf-text-generation-server is not running, make sure you have followed the steps to configure a node with GPU from the instructions provided above.
- Click the
Application boxicon in the header, and selectRetrieval-Augmented-Generation (RAG) LLM Demonstration UI
-
It will use the default provider and model configured as part of the application deployment. The default provider is a Hugging Face model server running in the OpenShift. The model server is deployed with this valdiated pattern and requires a node with GPU.
-
Enter any company name
-
Enter the product as
RedHat OpenShift -
Click the
Generatebutton, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.
You can optionally add additional providers. The application supports the following providers
- Hugging Face Text Generation Inference Server
- OpenAI
- NVIDIA
Click on the Add Provider tab to add a new provider. Fill in the details and click Add Provider button. The provider should be added in the Providers dropdown uder Chatbot tab.
Follow the instructions in step 3 to generate the proposal document using the OpenAI provider.
You can provide rating to the model by clicking on the Rate the model radio button. The rating will be captured as part of the metrics and can help the company which model to deploy in prodcution.
By default, Grafana application is deployed in llm-monitoring namespace.To launch the Grafana Dashboard, follow the instructions below:
- Grab the credentials of Grafana Application
- Navigate to Workloads --> Secrets
- Click on the grafana-admin-credentials and copy the GF_SECURITY_ADMIN_USER, GF_SECURITY_ADMIN_PASSWORD
- Launch Grafana Dashboard
GOTO: Test Plan
EDB Postgres for Kubernetes is distributed under the EDB Limited Usage License Agreement, available at enterprisedb.com/limited-use-license.












