PhilipMat

Two macOS tools for sandboxing agents

Both Agent Safehouse and Nono (get it, no-no?) use macOS sandboxing to execute agents.

Agent Safehouse

Pull down a self-contained Bash script with curl, and drop it in ~/.local/bin. Run your agent command prefixed with safehouse: safehouse opencode.
The tool auto-detects the git root of the working directory, applies a deny-all baseline, and layers on permissions for common toolchains.

Nono

Same, but installed with brew. Then nono run --profile claude-code -- claude to run a sandboxed agent.

Nono works on Linux as well, Agent Safehouse is macOS only. Nono is written in Rust, AS is all fish-shell scripting.

Founderland.ai mentions a few other:

Microsandbox and Agent Harbor lean on VM-level isolation. DevCage and AgentSphere target multi-platform or cloud deployments. Kilntainers gives each agent an ephemeral Linux sandbox via containers or microVMs.

May be worth investigating. It’s a fledgling space, so new tools will come and go.

TIL: Essential Claude Code Skills and Commands

Summary

The article explains the difference between Claude Code’s built-in slash commands and its prompt-based skills: slash commands are fixed, non-AI operations (like /clear or /model), while skills load instruction files into Claude’s context and can spawn subagents, accept arguments, use tools, and include supporting files and frontmatter. The commands-and-skills systems have been unified under the /slash interface, with .claude/skills/ recommended for new customizations because it supports richer features (templates, dynamic context, subagents, and more).

It then surveys the most useful built-in skills and commands: /simplify (automated code-quality review that spawns parallel reviewers and can auto-fix issues), /review (thorough code/PR review for bugs and edge cases), /batch (decomposes large refactors into parallel worktree agents), /loop (recurring scheduled prompts), /debug (session diagnostics), and /claude-api (loads API reference material). Helpful slash commands covered include /compact (conversation compression), /diff (interactive diff of Claude’s edits), /btw (side questions without polluting context), /copy (copy code to clipboard), and /rewind (undo changes). The piece highlights practical workflows—e.g., run /review for correctness then /simplify for cleanup—and recommends listing available skills with /skills.

Commands/skills I didn’t know about, and they seem useful:

  • /btw lets you ask a side question without affecting the main conversation context
  • /simplify reviews your recently changed files for code reuse opportunities, quality issues, and efficiency improvements – and then fixes them automatically.
  • /review gves you a proper code review of your changes – the kind of feedback you’d expect from a thorough pull request review.`
    [..] My typical workflow is: make changes, run /review to catch issues, fix anything it flags, then run /simplify to clean things up.

Also /copy to copy code to clipboard, with a selector for multiple changes, and /rewind to roll back to a certain point in order to explore a new path.

Source: Essential Claude Code Skills and Commands

TIL: Single-executable local LLM

Summary

llamafile is a single-file executable format that packages an open LLM’s runtime and weights so the model can run locally with no installation. By combining llama.cpp with Cosmopolitan Libc, a llamafile contains everything needed to execute a model on a user’s machine and aims to make open LLMs more accessible to developers and end users.

Technically, llamafiles add runtime dispatching for multiple CPU microarchitectures and concatenate AMD64 and ARM64 builds so the appropriate binary runs on each system. The format targets six OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, NetBSD) and supports embedding weights via PKZIP in the GGML library for memory-mapped, self-contained distribution. The project provides tooling to create and distribute llamafiles, is an Apache 2.0-licensed project with MIT-licensed changes to llama.cpp, and has recently been adopted by Mozilla.ai, which is soliciting community feedback on modernization plans.

Because it’s a LLM, this is akin to having Wikipedia offline, but you can ask it questions.

Also powered by Cosmopolitan libc, the explanation of which is an amazing work in itself.

Source: TIL: Single-executable local LLM

TIL: lnav - a fast log viewer with remote capabilities

A TUI for log files.

Summary

lnav is a terminal-based log file viewer that lets you merge, tail, search, filter, and query log files without any server or complex setup. It automatically detects file formats, unpacks compressed files on the fly, and provides online help and operation previews to simplify use.

Designed for performance, lnav can outperform standard terminal tools when processing large logs and exposes a SQLite interface for advanced querying. The project includes an introductory video and documentation to help users get started.

The remote-host tailing feature is kind of cool.

When lnav accesses a remote host, it transfers an agent (called the “tailer”) to the host to handle file system requests from lnav. The agent is an αcτµαlly pδrταblε εxεcµταblε that should run on most X86 Operating Systems. The agent will monitor the files of interest and synchronize their contents back to the host machine.

The only setup required is to ensure the machines can be accessed via SSH without any interaction, meaning the host key must have been previously accepted and public key authentication configured.

Source: TIL: lnav - a fast log viewer with remote capabilities

TIL: PR (anti-) patterns in the world of agentic AI

Simon Willison on PR (anti-) patterns in the world of agentic AI

There are some behaviors that are anti-patterns in our weird new world of agentic engineering.

It’s so easy to create AI PR slop in this new world, even on private teams.

If you open a PR with hundreds (or thousands) of lines of code that an agent produced for you, and you haven’t done the work to ensure that code is functional yourself, you are delegating the actual work to other people.

These are good rules for event human-created PRs and these stood out:

The change is small enough to be reviewed efficiently without inflicting too much additional cognitive load on the reviewer. Several small PRs beats one big one […]

The PR includes additional context to help explain the change. What’s the higher level goal that the change serves? […]

Agents write convincing looking pull request descriptions. You need to review these too! It’s rude to expect someone else to read text that you haven’t read and validated yourself.