What is it and How to Use it?

AI agents are moving beyond simple command-line tools into systems that can plan, schedule, call tools, and run automated workflows. Nous Research’s Hermes Agent framework offers a self-hosted runtime for building advanced agents with state management, tool integration, and secure execution.

It supports multi-step planning, background task control, and real-world automation beyond single-purpose coding assistants. In this article, we explore Hermes Agent’s architecture, setup, security model, and practical examples for building reliable AI agent workflows.

What is Hermes Agent and How is it Built?

Hermes is not just a prompt wrapper: it is an open-source agent runtime with multiple entry points, including a CLI, API server, and messaging gateway. It combines browser automation, terminal execution, file operations, memory, skills, and scheduling to support a wide range of real-world automation workflows.

Its layered architecture separates concerns and keeps the system manageable. User requests enter through the CLI or API, then move into the agent core, which generates prompts, calls the language model, runs tools, handles retries, and can fall back to alternate models when needed. This makes Hermes more resilient to rate limits, server errors, and authentication issues.

The diagram below combines the official architecture, agent loop, session storage, and tools runtime documentation.

The Agent Loop and State Management

Hermes shows its strength inside the agent turn loop. It runs one call per tool, but when the model requests multiple tools, Hermes executes them in parallel through a thread pool, speeding up complex workflows. It also manages the model context window by compressing conversations once they exceed 50% of the available context, while preserving recent messages and grouping related tool calls and results logically.

State management is handled through a local SQLite database with full-text search, allowing the agent to revisit past sessions and retrieve relevant context. Long-term memory is stored in two Markdown files: MEMORY.md for general facts and USER.md for user-specific preferences. Hermes also supports skills as procedural memory, letting agents create, update, and remove workflows over time.

Since Hermes is evolving quickly, tool counts and details may vary across documentation pages. For serious use, pin the Hermes version to keep results repeatable and avoid breaking configurations.

Installation and Environment Setup

Hermes offers a clean, single-line installer. Note, native Windows is not supported. Use WSL2 for Windows users. All that is required is the software Git. The correct versions of Python, Node.js and other necessary command-line tools are automatically installed.

# Linux / macOS / WSL2 / Android (Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Reload your shell
source ~/.bashrc # or source ~/.zshrc

# Choose your model/provider interactively
hermes model

In this blog we will set up Ollama local model inside the hermes agent

Go to “Custom Endpoint” in the model providers
Put http://127.0.0.1:11434/v1 in API base URL
Make sure you have Ollama installed and running in the background
We don’t have to provide any API key so press Enter
Then Select from the models you have on Ollama whether it is local or cloud model

# Diagnose setup if needed
hermes doctor

Let’s test the agent type the following in terminal

hermes chat

One of the best design decisions made in Hermes is in regard to configuration management. It utilizes two different files. Secrets, such as API keys, are placed inside of ./.hermes/.env. Non-secret settings are stored in ~/.hermes/config.yaml. This separation is a best practice in securing. Values are automatically inserted in the proper file by the hermes config set command.

Creating Profile

Use a conservative profile to ensure a safe and repeatable setup. The following setup could be used to allow manual approval of sensitive actions, execute terminal commands in a container with sandboxing, and prevent use of private network addresses.

If you want to set up LLM from another provider, first create the secrets file. This enables the API server and configures API keys for your chosen LLM provider and a cloud browser service.

# Secrets and service toggles in ~/.hermes/.env
cat > ~/.hermes/.env <<‘EOF’
OPENROUTER_API_KEY=replace-me
BROWSERBASE_API_KEY=replace-me
BROWSERBASE_PROJECT_ID=replace-me
API_SERVER_ENABLED=true
API_SERVER_KEY=replace-me-local-dev
EOF

Then, a main configuration file is created. The following example is based on a Docker backend for the terminal that will allow code to be executed in a secure and separated environment. It is the recommended solution for any serious self-hosted automation.

# Main settings in ~/.hermes/config.yaml
model: anthropic/claude-3-5-sonnet-20240620 # Replace with your provider/model

terminal:
backend: docker
docker_image: “nikolaik/python-nodejs:python3.11-nodejs20”
container_persistent: true

browser:
inactivity_timeout: 120

memory:
memory_enabled: true
user_profile_enabled: true

approvals:
mode: manual

security:
allow_private_urls: false

display:
streaming: true

Hermes is model-agnostic. Use an API from an API provider such as Anthropic or OpenAI, or connect to an API routing service such as OpenRouter or a self-hosted API that is OpenAI-compatible. For the purposes of this article we are using a specific model and it is important to note that this can be extended to any provider model you would like to use.

Hands-on Tutorials: From Automation to Research

Now, let’s explore the practical capabilities of the Hermes Agent. These tutorials demonstrate core features that enable complex, autonomous workflows.

Task Automation with Cron

Hermes includes a real cron subsystem for scheduled tasks. You can create recurring jobs using plain language. These jobs can run scripts, summarize files, or perform other actions. Results can be delivered to your chat, saved to a file, or sent to other platforms. The agent manages these jobs through its cronjob tool.

For example, you can start a chat session and give it a scheduled task.

Input: “Every weekday at 08:30, read ~/reports/daily_sales.csv, summarise anomalies, and send the result to my home channel.”

Hermes will create a job and schedule its next run. You can then inspect and manage your jobs from the command line.

# Inspect and manage jobs from the CLI
hermes cron list
hermes cron status
hermes cron run
hermes cron pause

To prevent runaway loops, Hermes enforces an important safety constraint. A session started by a cron job cannot create new cron jobs. If you try, the agent will block the action. This demonstrates the framework’s focus on stable, reliable automation.

Web Browsing and Tool Use

The browser tooling in Hermes is powerful. It supports cloud browser providers like Browserbase and can also control a local Chrome or Chromium instance. Instead of just fetching raw HTML, Hermes represents web pages as accessibility trees. This structured format makes it easier for a language model to navigate and interact with page elements.

Let’s try a simple research task. This prompt asks the agent to navigate a website, find information, and summarize an article.

Input: “Open https://news.ycombinator.com, list the top 5 stories, click the first one, then summarise the article’s core claim and any obvious caveats.”

This task showcases the agent’s ability to perform multi-step web interactions. It also provides an opportunity to test its security features. If by default, the configuration blocks access to private URLs. If you ask the agent to open a local address like http://localhost:3000, it should refuse the request.

Failure Mode Input: “Open http://localhost:3000 and take a screenshot of the dashboard.”

With allow_private_urls set to false, Hermes will block this action to prevent a potential Server-Side Request Forgery (SSRF) attack. However, Hermes has a smart solution for developers who need to work with both public sites and local applications. It can be configured to automatically route private URLs to a local browser while sending public URLs to the cloud provider. This is a strong production feature that balances security and convenience.

Memory and Session Search

Hermes uses its memory files, MEMORY.md and USER.md, to retain information across sessions. These files are injected into the system prompt when a new session starts. This gives the agent consistent context about your preferences and ongoing projects. It is a Self Improving agent it saves the user preferences and improve it over time.

Here is a simple conversation to test its memory.

Turn 1: “Remember that I want CSV outputs, British English, and concise executive summaries.”

Turn 2: “Also remember that my default project language is Python.”

After these turns, start a completely new session and ask a question to check its recall.

Fresh Session Input: “What output format, English variant, and language do I prefer?”

The agent should correctly retrieve the preferences you stored. Memory is injected at the start of a session, so a fresh session is the cleanest way to test this feature. The agent also rejects duplicate memories, so asking it to store the same fact twice is another simple way to see its internal logic at work.

Multi-step Planning and Programmatic Tool Calls

For truly complex tasks, Hermes offers advanced multi-step planning tools. These include persistent goals, sub-agent delegation, and programmatic tool calls.

Goals: You can set a persistent goal with the /goal command. The agent will continue working on this goal across multiple turns until a judge model determines it is complete or you pause it.

Delegation: You can ask the agent to delegate tasks to sub-agents. These child agents run with isolated contexts and a restricted set of tools. This is useful for breaking a large problem into smaller, parallelizable parts.

Code Execution: The execute_code tool is perhaps the most powerful feature. It allows the model to write and run a Python script that calls other Hermes tools. The script communicates with the agent over a local RPC bridge. This is highly efficient, as it can collapse a long, token-heavy sequence of tool calls into a single model turn.

Consider a research task that involves searching the web, fetching several pages, and summarizing them. A typical agent might do this with a dozen back-and-forth turns with the model. With execute_code, the model can write one script to do it all.

# Example script for execute_code
from hermes_tools import web_search, web_extract
import json

results = web_search(“Rust async runtime comparison 2025”, limit=5)
summaries = []

for r in results[“data”][“web”]:
page = web_extract([r[“url”]])

for p in page.get(“results”, []):
if p.get(“content”):
summaries.append({
“title”: r[“title”],
“url”: r[“url”],
“excerpt”: p[“content”][:500],
})

print(json.dumps(summaries, indent=2))

This feature is designed for heavy lifting. It has configurable limits on execution time and output size. If a script times out, the agent receives a timeout status and can decide how to proceed. This makes the agent operations layer more robust and predictable.

Integrations, Comparisons, and Operational Economics

Hermes is designed to be integrated with other systems. It has an API server that enables any front end that supports chat-completions to integrate with it. The Python library allows you to integrate the agent into other applications. Even it is possible to make Hermes available as a Model Context Protocol (MCP) server, for other agents to use its tools.

When comparing Hermes to other tools, focus on positioning.

Hermes Agent: A general automation, research and multi-surface deployment agent runtime with a wide scope.
OpenHands: An open platform for enterprise software development and custom coding-agent platforms.
Claude Code / Codex CLI: Developer focused coding assistants for terminal & IDE workflows.

Hermes is not fee based, but operational. The primary expense is the model inference, cloud browser sessions, sandbox compute. These costs can be managed by Hermes using provider routing policies which can be optimized for price or latency. Also, don’t forget to plan for benchmark runs; these can be resource intensive.

Conclusion

Hermes Agent stands out because it combines the core pieces needed for real-world AI agents: state, routing, tooling, memory, scheduling, and evaluation hooks in one package. For self-hosted automation enthusiasts, that makes it more than a coding assistant; it becomes a serious operations layer for building useful automations.

Use it with discipline. Pin environment versions, grant only necessary privileges, and test both successful workflows and failure modes. Keep official benchmarks separate from personal results. Used carefully, Hermes can support sophisticated, reliable AI-powered systems.

Frequently Asked Questions

Q1. Is Hermes Agent free?

A. Yes, Hermes Agent is open source under the MIT license. You may only need to pay for LLM inference, cloud tools, browsers, or hosting.

Q2. Can we run Hermes Agent on Windows?

A. Yes, Hermes Agent can run on Windows through WSL2, since it is not available as a native Windows operating system application.

Q3. What is the difference between Hermes and a normal coding agent?

A. Hermes offers CLI, API, gateway, memory, scheduling, and security controls, making it broader than coding agents tied to an IDE or CLI.

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Login to continue reading and enjoy expert-curated content.

Keep Reading for Free

What's Hot

What Happens When You Try to Treat OCD With Psilocybin

This $24 laptop accessory completely changed how I work

What is it and How to Use it?

What Happens When You Try to Treat OCD With Psilocybin

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

How Miro uses Amazon Bedrock to boost software bug routing accuracy and improve time-to-resolution from days to hours

A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies

Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account

How to Build Technical Analysis and Backtesting Workflow with pandas-ta-classic, Strategy Signals, and Performance Metrics

What Happens When You Try to Treat OCD With Psilocybin

This $24 laptop accessory completely changed how I work

What is it and How to Use it?

What Happens When You Try to Treat OCD With Psilocybin

This $24 laptop accessory completely changed how I work