Skill · v1.0.0 · MIT

error-handling

Design a coherent error-handling strategy across a codebase - what to throw vs return, where to catch, how to log, how errors surface to users and to operators. Use when the user asks about error handling, exception design, Result types, error boundaries, or wants to review how errors flow through the code.

elyra › /skills install error-handling

The hardest part of error handling isn't catching errors — it's deciding where each kind of error belongs and who is responsible for it. Get that wrong and you end up with try/catch everywhere and nothing actually handled.

When to use

"How should we handle errors in X?"
"Design an error strategy"
"Should this throw or return?"
"Review error handling in this code"
"Errors aren't being logged / are too noisy"

Core principles

1. Classify errors by who can do something about them

Class	Who handles it	Example
Bug	The developer (after deploy)	Null deref, off-by-one, wrong type
Expected user error	The user (via feedback)	Invalid email, password too short
Expected system condition	The code (retry, fallback)	Network timeout, rate-limited, lock contention
Disaster	Operators (via alert)	Database down, disk full, secrets revoked

The handling strategy follows from the class. Mixing them is where most error code goes wrong.

2. Errors are values; exceptions are control flow

Predictable failures (validation, lookups, parsing): return them. The caller decides.
Unpredictable failures (DB down, OOM): throw. They propagate to a boundary.

Throwing for predictable failures forces every caller to either know to catch or to silently pass the buck. Returning for unpredictable failures forces every caller to check for things that shouldn't normally happen.

3. Catch at boundaries, not everywhere

try/catch belongs at:

Request boundaries (HTTP handler, RPC, queue worker) — turn exceptions into responses
Resource boundaries (file/network call) — to retry, fall back, or release resources
Trust boundaries (calls to untrusted code, plugins) — to contain damage

Anywhere else, let it propagate. A try/catch in the middle of business logic that swallows or re-throws is almost always wrong.

4. Don't lose information

Every layer that catches an error should add context, not erase it.

// ❌ Loses the original
catch (e) {
  throw new Error("Failed to process order");
}

// ✅ Preserves the chain
catch (e) {
  throw new OrderProcessingError("Failed to process order 42", { cause: e });
}

In languages with error wrapping (Error.cause, Go's errors.Wrap, Rust's ? + thiserror), use it.

Patterns by language family

TypeScript / JavaScript

Pragmatic split:

Exceptions for bugs and unexpected conditions
Tagged unions / Result types for domain errors

type Result<T, E = AppError> = { ok: true; value: T } | { ok: false; error: E };

async function getUser(id: string): Promise<Result<User, "not_found" | "forbidden">> {
  const row = await db.users.findById(id);
  if (!row) return { ok: false, error: "not_found" };
  if (!can(currentUser, "read", row)) return { ok: false, error: "forbidden" };
  return { ok: true, value: row };
}

Or libraries: neverthrow, Effect.

Go

Idiomatic: return error as the last value. Wrap with context.

func getUser(id string) (*User, error) {
    row, err := db.QueryRow(id)
    if err != nil {
        return nil, fmt.Errorf("getUser %s: %w", id, err)
    }
    return row, nil
}

// Caller:
user, err := getUser(id)
if errors.Is(err, ErrNotFound) { ... }

Rust

Result<T, E> is the language. Use ? to propagate, thiserror for ergonomic enums, anyhow for application code.

Python

Exceptions are idiomatic. Define a domain hierarchy:

class AppError(Exception): ...
class ValidationError(AppError): ...
class NotFoundError(AppError): ...
class ExternalServiceError(AppError):
    def __init__(self, service: str, cause: Exception):
        self.service = service
        super().__init__(f"{service} failed: {cause}")
        self.__cause__ = cause

Java / Kotlin

Checked exceptions become noise — wrap in unchecked at the boundary. In Kotlin, prefer sealed-class results for domain errors.

PHP

Exceptions for unexpected; tagged returns / Result classes for domain. Laravel: lean into typed exceptions + a global handler that maps them to HTTP responses.

Error → user mapping

For user-facing errors, the error class should determine the response shape and message tone.

Class	HTTP	User message	Logging
Validation	400 / 422	Specific, actionable, per field	info
AuthN failure	401	Generic ("invalid credentials")	info (be careful with detail)
AuthZ failure	403	Generic ("not allowed")	warn (potentially malicious)
Not found	404	"Not found"	info
Conflict	409	What conflicts, what to do	info
Rate limit	429	Retry-After header + clear message	warn
External service	502 / 503	Generic ("temporary issue, try again")	error + alert
Unknown / bug	500	Generic + request ID	error + alert

Never leak internal details to the user. Stack traces in API responses are a security issue.

Error → operator mapping

Severity	Source	Logging	Alerting
Bug (uncaught)	code path	`error` with full context, request ID, user ID	yes
External dep failed	network call	`warn` with attempt count	only if rate > threshold
Validation failed	user input	`info` with field	no
Permission denied	authz	`warn` with user/route	only on burst (probing)
Auth failure	login	`info`	only on burst (brute force)

Tune levels until pages reflect actionable problems. If error fires constantly, change the level (it's not an error) or fix the cause (it shouldn't be happening).

The boundary handler

Each request/job boundary should have one error handler that:

Catches anything not already caught
Maps domain error class → HTTP response (or job retry decision)
Logs with request ID, user ID, route, latency
Includes the request ID in the response (so users can quote it to support)
Reports to error tracker (Sentry / Rollbar / etc.) for unexpected errors

// Express-ish example
app.use((err, req, res, next) => {
  const requestId = req.id;
  if (err instanceof ValidationError) {
    log.info({ requestId, err }, "validation_failed");
    return res.status(422).json({ error: { code: err.code, details: err.details, request_id: requestId } });
  }
  if (err instanceof NotFoundError) {
    return res.status(404).json({ error: { code: "not_found", request_id: requestId } });
  }
  // Unexpected — log + alert
  log.error({ requestId, err }, "unhandled");
  sentry.captureException(err, { tags: { requestId } });
  res.status(500).json({ error: { code: "internal", request_id: requestId } });
});

Retries

Don't retry blindly. Retry on specific errors, with bounded attempts, exponential backoff with jitter, and idempotency.

async function callWithRetry<T>(fn: () => Promise<T>, opts = { tries: 3, base: 200 }): Promise<T> {
  let lastErr: unknown;
  for (let i = 0; i < opts.tries; i++) {
    try {
      return await fn();
    } catch (e) {
      if (!isRetryable(e)) throw e;
      lastErr = e;
      const delay = opts.base * 2 ** i + Math.random() * opts.base;
      await sleep(delay);
    }
  }
  throw lastErr;
}

Retry on: timeouts, 502/503/504, transient network errors, lock contention. Do not retry on: 4xx (except 408, 425, 429), parse errors, auth failures.

Pair retries with a circuit breaker for downstream services.

Idempotency

Retries require idempotency. Without it, retries cause duplicate side effects.

Reads are naturally idempotent
Writes: use an idempotency key sent by the client; cache the response for N hours; replay if seen again
Outbound calls: include a unique key the receiver can dedupe on

Error message style

Messages have three audiences: developer, operator, user. Don't try to write one string for all three.

Audience	Where	Tone
Developer	Logs, stack traces	Technical, specific, full IDs
Operator	Alerts, dashboards	Symptom + runbook link
User	UI, API response	Plain language, no jargon, actionable

Each gets a tailored string. The log message is not the user message.

Output format

When reviewing error handling:

## Error handling review: <scope>

**Current style:** <exceptions everywhere / mixed / Result types / …>

### Strengths
- ✅ <thing>

### Gaps

#### 🔴 Lost information / dangerous swallowing
- `path:42` — catches everything, logs nothing, returns null

#### 🟠 Wrong layer
- `path:88` — try/catch around business logic that should propagate

#### 🟡 Inconsistent or noisy
- Same error class returned 3 different HTTP codes
- `error` log firing for expected validation failures (false alerts)

### Proposed strategy
- **Throw** for: bugs, infrastructure failures
- **Return** for: validation, lookup misses, business-rule denials
- **Catch at**: HTTP handler, queue worker, external-call site
- **Domain error hierarchy:** `AppError → {ValidationError, NotFoundError, ExternalServiceError, …}`
- **Boundary handler:** mapped table of class → HTTP code → log level

### Action items
1. Add boundary handler (see template)
2. Replace silent `catch (e) {}` instances (n found)
3. Add request_id propagation
4. …

Anti-patterns

❌ catch (e) {} — silent swallow. The bug becomes invisible.
❌ catch (e) { console.log(e); throw e; } — pure noise, no added value
❌ Returning null for both "not found" and "error" — caller can't distinguish
❌ Exceptions for control flow (throw new EndOfLoop())
❌ One giant try { everything } catch { showError() } at the top level
❌ Leaking stack traces / SQL / internal IDs to users
❌ Error messages without IDs — users say "I got an error" and you can find nothing
❌ Retrying 4xx errors (especially 400, 401, 403) — they won't get better
❌ Retrying without backoff — DDoS your own backend
❌ Reporting every validation failure to Sentry — drowns real bugs
❌ Different error formats per endpoint
❌ if (err.message.includes("not found")) — never type-check via string matching

Tips

A request ID on every log line + every error response is the single highest-leverage thing
One error format across the API. Document it.
Convert legacy throws to typed errors gradually at the boundary. Don't try to rewrite all at once.
Test the error paths. They're famously under-tested and famously where prod incidents live.
Read your own logs. Once a week, scan recent error logs. If you don't recognize them, they're either real bugs or noise — both worth fixing.

References

Joe Armstrong on "Let it crash" — Erlang's approach
Rust Book: Error Handling chapter
Go blog: error handling and Go
Stripe API errors — a polished error-response design
Designing Data-Intensive Applications — Reliability chapter

¶When to use

¶Core principles

¶1. Classify errors by who can do something about them

¶2. Errors are values; exceptions are control flow

¶3. Catch at boundaries, not everywhere

¶4. Don't lose information

¶Patterns by language family

¶TypeScript / JavaScript

¶Go

¶Rust

¶Python

¶Java / Kotlin

¶PHP

¶Error → user mapping

¶Error → operator mapping

¶The boundary handler

¶Retries

¶Idempotency

¶Error message style

¶Output format

¶Anti-patterns

¶Tips

¶References