Category: Design

  • Why today’s AI behaviors hint at more dire alignment futures

    LLMs and AI agents are edging toward systems that learn, adapt, and reorganize themselves. Even in today’s constrained settings, we’ve already seen glimpses of behaviors that, if allowed to evolve under continuous learning, could destabilize into something far more dangerous. This post examines three such signals. Each is observable now, each becomes more severe when…

  • The Motivation for Constraint-by-Balance: The Safety Gap After Deployment

    What does the future look like once it’s populated with all manner of AI agents? Do our current safety approaches fully encompass the risks associated with that future? The best-known approaches to AI safety (RLHF, Constitutional AI, scalable oversight, interpretability research) have made remarkable progress at aligning model behavior during training and evaluation. These methods…