This experimental chat application is implemented in Microsoft ASP.NET/C# and deployed on Microsoft Azure using Azure App Service and other Azure services.
A local Azure depolyed instance of Ollama (due to no GPU usage it is slow) provides responses using the local LLM llama3.2:3b model. Additionally, users can access the gpt-4o model hosted on Azure AI (the fastest option), while the most advanced capabilities are available via gpt-5-mini on OpenAI.
The current RAG (Retrieval-Augmented Generation) implementation is basic, built on a simple vector database.
RAG is one of the most common approaches for customizing responses based on the content of internal documents.
You can self define a prompt to drive the way the answers are going to be generated. Text given below overwrittes your roles radio buttons above.