Skip to content

Everything your team does for you.

In plain language first. The technical details are further down for anyone who wants them.

Apply for a Discovery Call

Four kinds of projects. One team.

Not every project is a software build. TinyFirm handles four kinds of work: building something (a website, a tool, a client system), exploring ideas (brainstorming, concept validation), researching (market analysis, competitive intelligence), or a mix of all three. You describe what you need during the intake. The right specialists show up.

Your team remembers everything.

Conversations, decisions, project details. All of it, permanently. Close your laptop for three weeks. Come back. Your team knows exactly where you left off and what to do next. The longer you work together, the smarter they get about your business. Month three feels completely different from month one.

Every project makes the next one better.

Finish your first project and start your second. Your new team already knows what worked, what didn’t, and what to avoid. Lessons from previous projects feed into future ones automatically. By project three, the system knows your patterns, your preferences, and your standards.

Nothing happens without your say-so.

Before your team does anything, they tell you what they plan to do and wait for your approval. You’re never surprised by changes you didn’t ask for. Approve a batch of work, or review each task individually. You’re the boss.

See exactly what’s happening, live.

A real-time dashboard shows you which team member is working on what, what phase your project is in, and what got done today. Open it in a browser, drag it to a second screen, and watch your project come together. Everything updates automatically. You never have to ask “where are we?”

Every change is saved automatically.

If something goes wrong, you can go back to any previous version with one click. Think of it as an unlimited undo button for your entire project. You’ll never lose work or wonder what changed. The complete history is right there whenever you need it.

Your work is checked five different ways.

Three of those checks are fully automatic and can’t be skipped or overridden. Your team doesn’t just say “it looks good.” The system verifies it from multiple angles before anything goes live. You don’t need to understand how the checks work. You just need to know: nothing ships until it’s been verified.

Your clients’ data is protected.

Every project gets automatic security checks before it launches. Your clients’ information is protected by the same standards used by banks and hospitals. This isn’t something you pay extra for or remember to turn on. It’s built into every project, every time.

The tool gets better because you use it.

Refine how your team works, what they check for, how they communicate. Every improvement you make carries forward to future projects. You’re not locked into defaults. You shape the system to fit how you work, and it stays that way.


Under the Hood

The technical details. For clients with developers on their team, or anyone who wants to know exactly how it all works.


Not everything is a software build.

TinyFirm generates teams for four types of work. The intake interview determines the track. The team composition adapts.

Build

Full software development. Frontend, backend, security, QA, documentation, DevOps. The intake maps your tech stack, compliance requirements, architecture preferences, and testing strategy. The team ships production code with persistent memory, quality gates, and security scanning baked in.

Example team: Pixel (Frontend), Wrench (Backend), Sentinel (Security), Atlas (QA), Quill (Docs). 5-7 agents typical.

Ideation

No code. The team brainstorms, explores, and evaluates. Business model canvases, market sizing, concept validation, competitive positioning. The deliverable is a recommendation you can act on, not a repository.

Example team: Scout (Research), Prism (Strategy), Beacon (Analysis). 3-4 agents typical.

Research

Deep investigation. Market analysis, competitive intelligence, data synthesis, trend mapping. Primary research (surveys, interviews) or secondary (reports, public data, market analysis). The deliverable is structured knowledge: reports, presentations, spreadsheets, or executive summaries.

Example team: Scout (Research Lead), Beacon (Quantitative Analysis), Quill (Report Writing). 2-3 agents typical.

Hybrid

Blend any of the above. Research a market, then build the product. Ideate three concepts, then prototype the winner. The tracks are composable. Phases adapt to what the project needs at each stage.

Example team: Varies. A hybrid project might start with 3 research agents and expand to 6 build agents when the scope crystallizes.

Who this matters to: Business owners with projects that go beyond software. Professionals exploring ideas before committing resources. Anyone whose work doesn’t fit neatly into one category.

Three layers of compounding intelligence.

TinyFirm gets smarter in three distinct ways. Each layer compounds independently.

Layer 1: Per-project memory.

Within a single project, agents accumulate knowledge across every session. What was built, which patterns work, what to avoid, your preferences, your constraints. Session 47 is as informed as session 1. This is the foundation.

Layer 2: Cross-project intelligence.

The Big Brain system aggregates lessons across all your projects. Agent effectiveness, team configurations that work, architectural decisions that shipped well, anti-patterns that wasted time. Your tenth project starts with the collective intelligence of the previous nine.

Layer 3: Workspace-level improvement.

Every change you make to agent definitions, quality gates, memory protocols, or delegation rules inside the workspace is inherited by every future project you create. Better defaults compound forever. A security checklist refined in March protects every project started in April, May, and beyond.

The result: You are not just using a tool. You are building an institutional knowledge base that compounds with every project, every session, and every improvement you make. The system is better tomorrow than it is today. Not because we shipped an update. Because you used it.

Your tenth project benefits from the first nine.

Every TinyFirm project generates a structured HQ report when you save progress: what was built, what worked, what did not, team effectiveness, and generalizable lessons. These reports accumulate in your workspace.

Say “sync big brain.” The system aggregates every HQ report into cross-project intelligence: which agent configurations perform best, which patterns produce reliable output, which anti-patterns waste cycles, and how team size should scale with scope.

What compounds

  • Agent playbook. Proven configurations, team-size guidelines, role combinations that work (and ones that do not).
  • Pattern library. Architectural decisions that shipped well across projects. Decisions that caused rework.
  • Team calibration. Which agents performed highly, which need refinement, which roles overlap.

This is not per-project memory (that compounds within a single engagement). This is portfolio intelligence. The kind of institutional knowledge that usually requires a decade of engineering leadership to accumulate, delivered structurally across every project you run.

Who this matters to: Anyone who plans to build more than one project. Business owners managing multiple ventures. Developers who want each project to benefit from everything that came before.

Nothing happens without your say-so.

Every action in TinyFirm follows a three-step cycle: Present, Approve, Execute.

Present. Ace describes what will be done, which agent handles it, and the expected outcome. This is a plan, not a notification.

Approve. You review the plan and give the go-ahead. Work does not begin on silence, ambiguity, or assumed intent.

Execute. Only after your explicit approval does Ace delegate to the specialist. The agent works within the scope you approved.

Batch approval for efficiency. Approve a multi-step plan and the team works through it without interrupting you at every step. Five bug fixes? Approve the batch. The team handles them in sequence while you do other things.

New tasks restart the cycle. If you raise something new mid-session, even while approved work is in progress, the cycle resets. Present. Approve. Execute. No assumptions.

Example interaction

You

“The login page needs rate limiting and the API needs input validation.”

Ace

“Here’s my plan:

  1. Wrench adds rate limiting to /api/auth endpoints (express-rate-limit, 5 attempts per 15 minutes)
  2. Wrench adds Zod validation schemas to all public API routes

Wrench handles both since they’re backend tasks. Expected output: committed code with passing tests for both changes. Want me to proceed?”

You

“Go ahead.”

Ace

[delegates to Wrench with full context, key files, and expected output]

This is not a convenience feature. It is a structural constraint. The delegation protocol enforces it mechanically. Ace cannot delegate without presenting first. The system is designed so that autonomous action is architecturally impossible.

Who this matters to: Anyone who wants to stay in control of what gets built. Business owners who need to trust that their project is moving in the right direction. Anyone who’s been burned by a tool that acted without asking first.

Improve the system itself.

TinyFirm is not a black box. Every agent definition, memory protocol, quality gate, and delegation rule lives in editable files inside your workspace. Specialist Hat Mode is how you improve them.

Say “wear your specialist hats.” Ace activates a mode where your team reviews and refines its own configuration: agent scopes, constraint lists, memory structures, and quality checklists. Every improvement is tested against the current project and, if validated, written back to the workspace.

What you can customize

  • Agent personality, scope, and constraint definitions.
  • Quality gate thresholds and checklist items.
  • Memory protocol structure and condensation rules.
  • Delegation protocols and reporting formats.

Why this matters: Every change you make to the workspace is inherited by all future projects created from it. A better security checklist today means better security audits on every project you start tomorrow. A refined agent definition means more reliable output across the board.

You are not locked into the defaults. You own the configuration. You improve it over time.

Who this matters to: Power users who want to customize how their team operates. Developers who want control over the system itself. Anyone who’s ever wished they could fine-tune how an AI tool behaves.

Full audit trail. Zero manual commits.

Every agent task is automatically committed to a local Git repository when it completes. Commit messages follow conventional commit format and include the agent’s name:

  • feat: rate limiting on public API routes | Wrench
  • fix: auth token refresh race condition | Bolt
  • refactor: extract validation middleware | Sentinel
  • test: add integration tests for user endpoints | Atlas

The full history is browsable in Cursor’s Source Control panel. Click any commit to see exactly what changed, line by line. Compare any two points in time. Revert to any previous state with a single command.

This is not opt-in. It is not configured per-project. It happens automatically after every task completes, enforced by the delegation protocol. The working tree must be clean before any new delegation proceeds. Uncommitted changes block the pipeline.

What this means in practice

  • Complete audit trail of who changed what and why.
  • Instant rollback when something breaks.
  • Blame any line to see which agent wrote it and during which task.
  • Full confidence to experiment. Every checkpoint is recoverable.

Purely local. Nothing is pushed to GitHub, GitLab, or any remote service unless you explicitly configure it. Your code history stays on your machine.

Who this matters to: Anyone who wants a complete record of what was built and when. Business owners who need peace of mind that nothing gets lost. Developers who want full accountability and instant rollback.

Queue work overnight. Wake up to results.

Optional

NightOwl is a macOS menu bar app that schedules local LLM tasks via Ollama. Queue analysis jobs, data processing, or batch transformations before you leave for the day. They run locally on your machine using free, open-source models. Results are ready by morning.

How it works

  • Click the menu bar icon. Add a task to the queue.
  • Select a model (DeepSeek-R1 70B, Qwen 2.5 32B, Llama 3.3, or any Ollama-compatible model).
  • Set the schedule: run immediately, run at a specific time, or run when the machine is idle.
  • Results are written to a local output directory or delivered to a remote server via SCP.

Use cases

  • Analyze a large codebase overnight and have a summary ready in the morning.
  • Process a batch of data files through a local model without tying up Cursor.
  • Run multiple analysis passes with different models and compare results.
  • Deliver processed output to a remote staging server automatically.

NightOwl runs independently of Cursor. You do not need an active editor session. The app sits in your menu bar, manages the queue, and handles model loading and unloading for you.

Cost: $0. Local inference only. No API calls, no cloud dependencies, no usage fees.

System Requirements

  • macOS Sonoma (14) or newer with Apple Silicon.
  • 48GB unified memory for full 70B model access (M3 Pro, M4 Pro, M3 Max, M4 Max, or M4 Ultra).
  • 32GB unified memory for 32B models only (M1 Pro/Max, M2 Pro/Max, or newer).
  • 70GB free disk space for model downloads.
  • Ollama installed (free, open-source).

NightOwl runs overnight batch jobs. Even at the minimum spec (48GB, ~5 tok/s), a complex analysis completes in minutes, not hours. Speed matters less when the machine works while you sleep.

Who this matters to: Developers and power users who want their computer working while they sleep. Anyone with repetitive analysis tasks that don’t need real-time attention.

Deep analysis for large codebases.

Optional

When delegated files exceed 100KB total, standard single-model analysis loses fidelity. The RLM Analyzer uses a two-model pipeline to maintain depth at scale.

The pipeline

  1. 1Fast model (Qwen 2.5 32B): Reads the full file set and writes analysis code: chunking strategies, extraction patterns, and domain-specific queries tailored to the task.
  2. 2Reasoning model (DeepSeek-R1 70B): Executes the analysis code against each chunk, performing deep reasoning with full chain-of-thought on complex sections.

The result is a structured findings JSON that surfaces what a single-pass model would miss: cross-file dependencies, subtle logic errors, architectural drift, and dormant technical debt.

When it runs

  • Automatically during phase-gate reviews when key files exceed the 100KB threshold. Ace detects this and includes analyzer instructions in the delegation.
  • Manually when you want deep analysis on a specific file or directory. Run the script directly.

Both models run locally via Ollama. No API costs. No data leaves your machine.

System Requirements

  • 48GB unified memory (Apple Silicon) or 42GB+ VRAM (NVIDIA dual-GPU / workstation GPU).
  • Works on macOS, Linux, and Windows wherever Ollama runs.
  • 70GB free disk space for both models.

Fallback: Users with 32GB memory can swap to 32B models with minimal quality loss (within 1-3% on code/math benchmarks).

Who this matters to: Developers working with large, complex projects. Anyone whose codebase has grown past the point where a standard AI tool can understand the full picture.

Five models. Three providers. One consensus.

Before any phase advances in a Build project, TinyFirm runs a multi-model adversarial review. Five AI models from different providers independently analyze the phase output. They do not see each other’s results. They do not collaborate.

The models

  • DeepSeek-R1 70B (local via Ollama, free)
  • Llama 3.3 70B (via Groq, free tier)
  • Gemini 2.5 Flash (via Google, free tier)
  • Qwen 3 235B (via Cerebras, free tier)
  • o4-mini (via OpenAI, ~$0.04 per review)

When multiple models independently flag the same issue, it is almost certainly real. A single model hallucinating a false positive gets outvoted. Consensus findings surface genuine problems: logic errors, security gaps, performance regressions, missing edge cases.

This is not an optional code review you can dismiss. It is a mechanical gate. The phase does not advance until findings are addressed. It cannot be skipped, edited, or waved through.

Cost: Four of the five models run free. Total cost per phase-gate review is approximately $0.04. The quality ceiling this buys would cost thousands in human review time.

Who this matters to: Anyone who wants independent verification that their project was built correctly. Business owners who can’t review every technical detail themselves. When multiple AI models agree something needs fixing, it almost certainly does.

Five layers. Three automated. None rely on self-assessment.

TinyFirm does not trust agents to evaluate their own output. Quality enforcement is structural: deterministic gates that block bad code, independent reviewers that catch what authors miss, and security tools that scan for what both miss.

Layer 1: Agent Self-Review

Every code-owning agent has a domain-specific quality checklist in their definition file. Security agents check for hardcoded credentials, XSS vectors, and auth bypasses. React agents check dependency arrays, render-loop risks, and server/client boundaries. Backend agents check input validation, rate limiting, and query safety. Before writing their summary, agents review their work against this checklist.

This is the weakest layer. It exists because it catches obvious mistakes cheaply. It is not trusted.

Layer 2: Deterministic Quality Gate

After every agent task, a shell script runs the project’s TypeScript typecheck, linter, and test suite. This is mechanical. It reads compiler output. It does not interpret, it does not make judgment calls, it does not get tired. Code that produces type errors does not get committed. Code that fails tests does not get committed. There is no override flag.

If the gate fails, the agent is re-delegated automatically with the failure output. Fix and re-run. No human intervention required for routine errors.

Layer 3: Snyk Security Scan

After the quality gate passes, Snyk scans the changed files for known vulnerabilities. High and critical findings are reported before commit. The team does not proceed on insecure code without explicit human acknowledgment.

This is automatic. No approval required to run. No cost.

Layer 4: Phase-Gate Multi-Agent Review

Before advancing from one phase to the next, Ace dispatches 2-3 agents in parallel to review the phase output. A backend engineer reviews a frontend agent’s API integration. A security agent reviews auth flows. A QA agent reviews test coverage. Each reviewer produces findings independently.

Critical and should-fix findings must be resolved before the phase advances. This is where cross-cutting issues surface: the security gap that the backend agent did not think about, the edge case the frontend agent did not test.

Layer 5: Security Hardening Phase

Before any Build project deploys to production, the security agent runs a full audit:

  • OWASP ZAP (free): spider + active scan covering the OWASP Top 10.
  • Nuclei (free): 9,000+ vulnerability templates, known CVEs.
  • AgentShield (free): agent configuration audit, 102 static rules checking for prompt injection and unsafe permissions.
  • Shannon (enhanced tier, ~$50/run): autonomous AI pentester. Real exploit proofs. Zero false positives. “No exploit, no report.”

The project does not deploy until the security agent produces a signed-off report with a GO/NO-GO recommendation.

The result: Five layers. Three are fully automated (quality gate, Snyk, multi-model review). Two involve agent judgment but with mechanical enforcement. No single point of failure. No “the agent said it looked fine.”

Who this matters to: Anyone launching a project that needs to work reliably from day one. Business owners whose reputation depends on things working right. Developers who’ve seen AI produce output that looked correct but wasn’t.

See how it works for your project.

Every team is custom-generated. The discovery call is where we figure out what yours looks like.

Apply for a Discovery Call

$555/mo + $2,200 one-time setup. Cursor subscription required separately. Full pricing details at /pricing.