Quickstart
This section provides a quickstart example for creating an AI Agent with Llama Stack.
TOC
PrerequisitesQuickstart ExampleVector Store UsageFAQHow to prepare Python 3.12 in NotebookAdditional ResourcesPrerequisites
- Python 3.12 or higher (if not satisfied, refer to FAQ: How to prepare Python 3.12 in Notebook)
- Llama Stack Server installed and running via Operator (see Install Llama Stack), with
VLLM_URLpointing at a vLLM-served model endpoint andPOSTGRES_*configured for server persistence (see install notes) - Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab)
- Python environment with
llama-stack-client==0.7.1,fastmcp(for the MCP section), and other notebook dependencies installed
Quickstart Example
A simple example of creating an AI Agent with Llama Stack is available in the following resources:
- Notebook:Llama Stack Quick Start Demo
Download the notebook and upload it to a Notebook environment to run.
The notebook demonstrates:
- Two tool options: client-side tools (
@client_tool) and MCP tools (FastMCP +toolgroups.register) - Shared agent flow: connect to Llama Stack Server, select a model, create an
Agentwithtools=AGENT_TOOLS, then run sessions and streaming turns - Optional vector store flows: upload a file, create a
pgvectorormilvus-remotebacked vector store, and run a search query - Streaming responses and event logging
- Optional FastAPI deployment of the
agent
Vector Store Usage
The downloadable notebook includes optional PGVector and Milvus sections.
For PGVector, start the server with ENABLE_PGVECTOR=true and valid PGVECTOR_* connection settings, then execute the PGVector cells in the notebook. ACP-provided PostgreSQL can be used directly because it already includes the pgvector extension.
For Milvus, start the server with MILVUS_ENDPOINT, optional MILVUS_TOKEN, and MILVUS_CONSISTENCY_LEVEL, then execute the Milvus cells in the notebook. Use provider_id="milvus-remote" in the client request.
For both vector-store examples, client.models.list() must include an embedding model, for example sentence-transformers/nomic-ai/nomic-embed-text-v1.5. If it only returns LLM models, restart the LlamaStackDistribution with ENABLE_SENTENCE_TRANSFORMERS=true and configure Hugging Face cache/download access as described in Install Llama Stack.
The notebook example covers:
- Uploading a file through
client.files.create(...) - Creating a vector store with
provider_id="pgvector"orprovider_id="milvus-remote" - Passing
embedding_modelandembedding_dimensionthroughclient.vector_stores.create(..., extra_body=...) - Running a search with
client.vector_stores.search(...); PGVector usessearch_mode="hybrid"inextra_body
FAQ
How to prepare Python 3.12 in Notebook
-
Download the pre-compiled Python installation package:
-
Extract with:
-
Install and Register Kernel:
-
Switch kernel in the notebook page:
- Open your Notebook environment (e.g., Jupyter Notebook or JupyterLab) in the browser, then open an existing notebook or create a new one.
- In the notebook interface, find the current kernel name (usually shown in the top-right corner of the page, e.g., "Python 3" or "python3").
- Click that kernel name, or use the menu Kernel → Change Kernel.
- In the kernel list, select "Python 3.12" (the display name registered in step 3).
- After switching, new cells will run with Python 3.12.
Note: When executing python and pip commands directly in the notebook page, the default python will still be used. You need to specify the full path to use the python312 version commands.
Additional Resources
For more resources on developing AI Agents with Llama Stack, see:
- Llama Stack Documentation - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
- Llama Stack Core Concepts - Deep dive into Llama Stack architecture, API stability, and resource management.
- Llama Stack GitHub Repository - Source code, example applications, distribution configurations, and how to add new API providers.
- Llama Stack Example Apps - Official examples demonstrating how to use Llama Stack in various scenarios.