LLM Proxy Service

When building AI agents, one of the first requirements you might have is to connect your agent to a Large Language Model (LLM). Fortunately, the Agent Stack helps with this by providing built-in OpenAI-compatible LLM inference. The platform’s OpenAI endpoints are model and provider agnostic, serving as a proxy to whatever is configured. For you as an agent builder, the usage is extremely simple because we’ve wrapped the usage into a Service Extension.

Service Extensions are a type of A2A Extension that allows you to easily “inject dependencies” into your agent. This follows the inversion of control principle where your agent defines what it needs, and the platform (in this case, Agent Stack) is responsible for providing those dependencies.

Service extensions are optional by definition, so you should always check if they exist before using them.

Quickstart

Add LLM service extension to your agent

Import the necessary components and add the LLM service extension to your agent function.

Configure your LLM request

Specify which model your agent prefer and how you want to access it.

Use the LLM in your agent

Access the optionally provided LLM configuration and use it with your preferred LLM client.

Example of LLM Access

Here’s how to add LLM inference capabilities to your agent:

import os
from typing import Annotated

from a2a.types import Message
from a2a.utils.message import get_message_text
from agentstack_sdk.server import Server
from agentstack_sdk.a2a.types import AgentMessage
from agentstack_sdk.a2a.extensions import LLMServiceExtensionServer, LLMServiceExtensionSpec

server = Server()

@server.agent()
async def example_agent(
    input: Message,
    llm: Annotated[
        LLMServiceExtensionServer,
        LLMServiceExtensionSpec.single_demand(suggested=("ibm/granite-3-3-8b-instruct",))
    ],
):
    """Agent that uses LLM inference to respond to user input"""

    if llm and llm.data and llm.data.llm_fulfillments:
        # Extract the user's message
        user_message = get_message_text(input)
        
        # Get LLM configuration
        # Single demand is resolved to default (unless specified otherwise)
        llm_config = llm.data.llm_fulfillments.get("default")
        
        if llm_config:
            # Use the LLM configuration with your preferred client
            # The platform provides OpenAI-compatible endpoints
            api_model = llm_config.api_model
            api_key = llm_config.api_key
            api_base = llm_config.api_base

            yield AgentMessage(text=f"LLM access configured for model: {api_model}")
        else:
            yield AgentMessage(text="LLM configuration not found.")
    else:
        yield AgentMessage(text="LLM service not available.")

def run():
    server.run(host=os.getenv("HOST", "127.0.0.1"), port=int(os.getenv("PORT", 8000)))

if __name__ == "__main__":
    run()

How to request LLM access

Here’s what you need to know to add LLM inference capabilities to your agent: Import the extension: Import LLMServiceExtensionServer and LLMServiceExtensionSpec from agentstack_sdk.a2a.extensions. Add the LLM parameter: Add a third parameter to your agent function with the Annotated type hint for LLM access. Specify your model requirements: Use LLMServiceExtensionSpec.single_demand() to request a single model (multiple models will be supported in the future). Suggest a preferred model: Pass a tuple of suggested model names to help the platform choose the best available option. Check if the extension exists: Always verify that the LLM extension is provided before using it, as service extensions are optional. Access LLM configuration: Use llm.data.llm_fulfillments.get("default") to get the LLM configuration details. Use with your LLM client: The platform provides api_model, api_key, and api_base that work with OpenAI-compatible clients.

Understanding LLM Configuration

The platform automatically provides you with:

api_model: The specific model identifier that was allocated to your request
api_key: Authentication key for the LLM service
api_base: The base URL for the OpenAI-compatible API endpoint

These credentials work with any OpenAI-compatible client library, making it easy to integrate with popular frameworks like:

BeeAI Framework
LangChain
LlamaIndex
OpenAI Python client
Custom implementations

Model Selection

When you specify a suggested model like "ibm/granite-3-3-8b-instruct", the platform will:

Check if the requested model is available in your configured environment
Allocate the best available model that matches your requirements
Provide you with the exact model identifier and endpoint details

The platform handles the complexity of model provisioning and endpoint management, so you can focus on building your agent logic.

Introduction

Deploy Agents

Agent Integration

Example Integrations

Reference

Deploy Agent Stack

Custom UI Integration

Experimental

Community

Quickstart

Example of LLM Access

How to request LLM access

Understanding LLM Configuration

Model Selection

Introduction

Deploy Agents

Agent Integration

Example Integrations

Reference

Deploy Agent Stack

Custom UI Integration

Experimental

Community

Documentation Index

​Quickstart

​Example of LLM Access

​How to request LLM access

​Understanding LLM Configuration

​Model Selection

Quickstart

Example of LLM Access

How to request LLM access

Understanding LLM Configuration

Model Selection