MrCrab: Building a Lightweight Agentic AI Framework for Small Models

Introduction Over the past year, agentic AI frameworks have grown rapidly, but many of them are designed with large models in mind. This creates friction when developers want to run smaller models locally, either for cost efficiency, privacy, or speed. After experimenting with OpenClaw, NanoClaw, PicoClaw, and Nanobot, we decided to build our own agent: MrCrab.

Why Small Models Matter

  • Local deployment without dependency on cloud quotas.
  • Lower hardware requirements, making AI accessible to more teams.
  • Faster iteration cycles for debugging and prototyping.
  • Privacy and compliance advantages when data never leaves your infrastructure.

Design Principles of MrCrab

  • Tool Registry: Instead of injecting a massive tool list into every prompt, MrCrab allows the agent to query tools dynamically by keyword.
  • Hybrid Memory: Recent turns are kept in context, older ones are summarized, and full logs are stored in persistent memory for retrieval on demand.
  • Backend Flexibility: MrCrab integrates with Ollama, AnythingLLM, and any provider compatible with the OpenAI API.
  • Lightweight Prompts: Optimized for small models like Gemma4:e2b, Qwen3.5:2B, Granite4:1B.

Implementation Highlights

  • Modular architecture written with simplicity in mind.
  • Debugging and logging designed to be transparent.
  • Easy integration with local or cloud‑based LLMs.

Lessons Learned

  • Large prompts and tool lists overwhelm small models.
  • Timeout handling must be explicit when working with local inference.
  • Summarization should be progressive, not premature.

Future Work

  • Extending MrCrab to real business use cases, such as community management systems.
  • Adding support for multi‑agent collaboration.
  • Exploring long‑context training for small models.

MrCrab is our attempt to make agentic AI practical for small models. By focusing on lightweight prompts, dynamic tool discovery, and hybrid memory, we believe it can bridge the gap between experimental frameworks and production‑ready agents.

MrCrab is written in PHP, without third‑party libraries. This design choice minimizes supply chain risks and ensures that the agent can be deployed in a secure and portable way. Developers can run MrCrab locally with minimal setup, while still benefiting from integrations with Ollama, AnythingLLM, and OpenAI‑compatible APIs.

Why We Integrated Both Ollama and AnythingLLM for Local AI in Community Management

At Communities of Neighbors Management System, privacy and transparency are at the heart of everything we build. That’s why we’ve integrated support for local large language models (LLMs), giving community administrators the ability to generate professional announcements and notifications without sending sensitive data to external services.

We started with Ollama, a powerful local LLM runner that makes it easy to deploy models directly on personal computers. Ollama ensures that announcements—such as water pipe bursts, lighting outages, or maintenance notices—can be drafted quickly and securely, with all data staying inside the community’s environment.

But we didn’t stop there. We also integrated AnythingLLM, which brings unique advantages for users on Windows 11 ARM64 devices powered by Qualcomm processors. Unlike Ollama, AnythingLLM supports NPUs (Neural Processing Units) natively, unlocking hardware acceleration and improved performance on modern ARM64 systems. This means faster inference, lower energy consumption, and smoother experiences for administrators working on Qualcomm-powered PCs.

Additionally, AnythingLLM offers RAG (Retrieval-Augmented Generation) capabilities. Administrators can connect shared documents within AnythingLLM, allowing the LLM to reference community-specific files when generating announcements. This makes notifications more accurate, contextual, and tailored to the needs of each building or neighborhood.

By supporting both Ollama and AnythingLLM, we give communities the freedom to choose the local AI solution that best fits their hardware and workflow. Whether it’s CPU-based inference with Ollama or NPU-accelerated generation with AnythingLLM, administrators can rely on our platform to deliver professional communication while safeguarding resident privacy.

Local AI is not just about performance—it’s about trust. With Ollama and AnythingLLM, Communities of Neighbors Management System empowers administrators to manage communication responsibly, securely, and efficiently.

Join our social network channels for quick updates:

This website stores cookies on your computer. These cookies are used to provide a more personalized experience and to track your whereabouts around our website in compliance with the European General Data Protection Regulation. If you decide to to opt-out of any future tracking, a cookie will be setup in your browser to remember this choice for one year.

Accept or Deny