A Model of AI Agent Types

In the last two posts look at motivations for the C-by-B architecture and looked at how current AI behaviors hint at more dire future alignment issues. With this post we are switching from concerns to remedies.  We will start by grounding the C-by-B architecture in a model of AI agent types.

Efficiency and Efficacy – Can Constraint-by-Balance Deliver Both?

Reviewers of the Constraint-by-Balance architecture may raise two foundational concerns. First, efficiency: Can externalized constraint operate fast enough for time-critical decisions? Second, efficacy: can historical harm patterns meaningfully constrain behavior in genuinely novel or complex contexts?

The Efficiency Challenge: Real Speed Requirements Versus Perceived Speed

I make the assumption here that most domains where agentic AI will operate—strategic planning, medical diagnostics, infrastructure management, corporate decision making— will continue to function on human timescales: minutes to days, not milliseconds. These deliberative environments are well within the temporal range of full C-by-B evaluation.

Only a small subset of applications—collision avoidance, grid protection, high-frequency trading—require split-second responses. It is in this small subset of applications where we actively want to, or will need to due adversarial pressures, get humans out of the loop and benefit from AI’s superior capabilities with regards to scope of comprehension, speed of action, and holding up under pressure. Critically, these scenarios tend to involve:

  • Narrow and bounded task scope
  • Shallow reasoning chains
  • Low abstraction
  • Well-defined success metrics

They are not the most likely contexts where emergent reasoning or structural misalignment is likely to unfold. In these bounded domains, a Decisive Action Mode can be used: the Evaluator Twin applies a fast-check layer against an enumerated list of gross harms, defaulting to action unless unacceptable risk is immediately evident.

Importantly, this mode doesn’t bypass safety—it shifts the burden of proof:

  • For fast-execution cases: “Act unless unacceptable harm is clear.”
  • For deliberative reasoning: “Act only when acceptable and balanced harm is clear.”

This fundamental burden of proof shift handles the edge case problem: instead of trying to design one system that works for both contemplative strategic planning AND split-second emergency response, C-by-B is one architecture with two operational modes calibrated to different risk environments.

In parallel to – not constraining decisive action – the Evaluator twin can engage in deeper harm analysis and then require the Cognitive Twin to implement comprehensive contingency planning (emergency notifications, secondary response protocols, and stakeholder alerts) ensuring that rapid action occurs within a framework of coordinated protective measures. Decisive Action Mode thus provides minimal but necessary restraint while simultaneously marshaling supportive actions and.

The vast majority of agentic AI applications, including those posing the greatest emergence risks, operate comfortably within timeframes that allow full constraint evaluation. Generalized emergence is more likely here because of longer reasoning chains in more varied environments, deeper memory across multiple episodes of reasoning, more abstract goals: all providing richer conditions for unpredictable recursive self-modification.

Fortunately, in these situations the need for speed is more perceptual than actual. Primarily it will be experienced by humans interacting with AI in sessions of collaborative thinking and precisely because of this, there are UI/UX solutions, essentially progressive disclosure with additional work in the background. Additionally, have a dedicated Evaluator relocates latency already expended on safety checks into a distinct, more robust architectural component, separately constraint from optimization.

This dual-mode capability avoids the false dichotomy between “always safe” and “always fast,” and recognizes that different risk regimes demand different operational postures.

Agent Working Zones

C-by-B is envisioned as design framework that can be extended across the agent landscape. By intersecting hardware limitations (computational budget, connectivity) with contextual stakes (latency needs, reversibility, harm magnitude and others), the framework establishes seven distinct “working zones” ranging from edge-based autonomous defense (Zone 1) to high-level strategic advisory (Zone 7). In response to these zones, five specific Evaluator classes are divided into two families: “Gate Keepers”, which execute sub-second, binary vetoes for immediate, irreversible harms; and “Action Shapers”, which engage in iterative assessments to refine complex or strategic outputs.

For more on the agent working zone, see Appendix Four of the comprehensive C-by-B explainer.  In the next post, we will focus on specific requirements for the Evaluator, regardless of working zones.


Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *