Back to Blog

Briefly© (Part 2): Building a Context-Aware LLM Advisor with MCP and RAG

This post outlines the internal architecture behind Briefly©, an open-source, LLM-powered investment assistant. It focuses on three core components:

  • MCP (Model context protocol): the agent-tool communication system
  • Portfolio- and news-grounded RAG: context-aware retrieval over structured and unstructured financial data
  • Context memory: the use of persistent user state to inform investment guidance

The design goal is to move beyond single-turn retrieval-augmented prompts by creating a modular, memory-integrated agent that interfaces with portfolios and real-world data through structured tool calls.


Model context protocol

Briefly implements a client–provider orchestration system called MCP. The protocol separates agent planning from execution. When the assistant needs to retrieve or process data, it doesn't do so directly – it requests tools.

Components:

  • MCP client: Handles tool execution and communication with the LLM
  • LLM: Plans tool usage using structured function calls
  • Providers: Modular endpoints that fulfill tool calls (e.g., fetch portfolio, fetch news, summarize news, generate advice template)

Example tool call flow:

query → LLM invokes "prepare_advice_template"  
            → Client routes to provider endpoint  
            → Provider builds and returns prompt data  
            → LLM generates response

The agent logic remains LLM-side, while all side-effects (data lookup, formatting, etc.) are handled externally. This enforces strict separation of inference and operations.


Context through memory and portfolios

Each user's investment state is defined in a persistent user state including a portfolio object and investment objective. This contains:

  • Asset allocations
  • Investment goals, short- and/or long-term

When a query is made, the assistant retrieves the relevant portfolio summary and incorporates it into its reasoning. It does not rely solely on prompt memory – context is passed explicitly through the MCP.

The user state becomes the grounding document for all decisions. It is not static. As goals change or assets shift, the memory context updates accordingly.


RAG for news data

To adapt to market conditions, the agent integrates retrieval-augmented generation using current news data sources – if relevant articles don't exist in the object (no SQL) database provider, they are fetched using a news provider.

A query like "What should I do with my mineral stocks this quarter?" triggers the following:

  1. Portfolio provider returns a structured summary of user assets, portfolio summary and exposures
  2. News provider fetches relevant recent headlines from a DB based on a vector search (using user's query and portfolio)
  3. News scraper fetches and summarizes relevant recent headlines externally like Google News
  4. The orchestrator returns the final investment advice based on an initial template and static and dynamic context.

The assistant doesn't rely on static information. Every answer is assembled with both stored context and dynamic market input.


Query execution flow

A full advice request proceeds as follows:

  • User inputs a question
  • MCP Client sends the prompt to the LLM
  • LLM returns a tool plan (e.g., request portfolio summary, fetch news)
  • Client routes each call to its respective provider
  • Once all results are returned, the LLM receives final context and generates an answer

MCP client and providers

Phase 1: Input validation

Before the LLM is even called, the MCP client runs two provider checks:

  • /validate-prompt: Is the question even relevant to investment?
  • /validate-investment-goal: Can we infer an investment objective from this question?

If either of these fail, the system returns a pre-generated message and exits early.

Phase 2: LLM + tool call loop

If the inputs are valid, the MCP client initializes the system/user messages and enters a loop. The assistant is allowed to make tool calls (e.g. "prepare_advice_template"), which the client intercepts and dispatches:

loop:  
  send current messages to LLM  
  if response is final → return answer  
  if response includes tool calls:  
    for each tool_call:  
      lookup endpoint  
      send HTTP request to provider  
      receive structured result  
      append tool response to message list  
      if tool == "prepare_advice_template":  
          append advice_prompt to messages  
          break and continue chat with complete context

Once the "prepare_advice_template" tool is called, the client injects the generated prompt directly into the conversation and ends the loop. This lets the assistant continue the conversation with updated context – grounded in portfolio memory and news.

Phase 3: Final response

After context assembly, a final LLM call is made using all accumulated messages. The assistant returns a single structured summary, which is archived and displayed.

Provider Responsibilities

Each tool call from the LLM maps to one provider endpoint. These are plain FastAPI routes that delegate to service functions. Here's what each one does:

/validate-prompt

Purpose: Checks whether the user's question is relevant to investing.

Details: Receives the question, portfolio ID, and user ID. Calls a validation function that which itself is a prompt to a GPT mini to determine whether the user's question is investment related.

Returns:

{ "valid": true | false }

/validate-investment-goal

Purpose: Extracts whether the question contains a clear investment objective (e.g. reduce risk, grow capital).

Details: Uses a mini GPT to determine if the user's investment objective is present in the context via memory provider or if it can be inferred from the question. Otherwise a false boolean is returned asking the user to clarify their objective.

Returns:

{ "valid": true | false }

/retrieve-news

Purpose: Retrieves relevant financial news summaries based on the portfolio's focus and the user's query.

Details: Runs a custom scraper (and/or LangChain wrapper) to fetch and summarize recent headlines. Filters results based on asset class or strategy tags.

Returns:

{ "summaries": [ { "title": ..., "summary": ... }, ... ] }

/get-portfolio

Purpose: Retrieves portfolio assets and summarizes them based on user's portfolio id. Represents it in terms of asset, region and sector concentration.

Details: Calls DB functions to get the portfolio, calculate concentrations and returns a string representation of the portfolio.

Returns:

{ "portfolio_summary": "string representation of portfolio composition and exposures" }

This system gives the LLM everything it needs – without direct access to raw data or internal APIs. The MCP client handles logic flow; the providers handle data. The result is a stateless, auditable, and memory-augmented architecture that can be extended.


Error handling

Tool failures (e.g., a timeout or invalid data) trigger retry logic within the MCP client. Providers return structured error codes.


Summary

  • LLM orchestration should not rely on long prompts or chained conditions. Briefly© uses MCP to modularize logic cleanly.
  • Context should be persisted, not re-prompted. Portfolios serve as structured, queryable memory.
  • RAG is only effective when the data is relevant. Strategy-tagged retrieval ensures responses are both current and user-specific.
  • Keeping execution out of the agent allows for scalability, testing, and future tool expansion without retraining.

Upcoming improvements

Planned improvements include:

  • Expanded memory features (e.g., progress tracking – whether user followed up on investment advice, revision tracking)
  • Multi-agent coordination (e.g., compare strategies between portfolios)
  • Real-time financial data integration (e.g., asset price feeds, asset tickers)

The core protocol is in place. The focus now is on refining tooling and surfacing more contextual behavior through memory-aware reasoning.