When a SaaS platform fails in production, the first question is rarely “what’s the bug?” It’s “what just happened, who is impacted, and what changed?” In Laravel apps, especially ones with queues, third-party integrations, and multi-tenant data models, you can’t answer those questions from a stack trace alone.
That’s where Laravel observability comes in: designing your system so you can understand its behavior from the outside, using logs, metrics, and traces (often called the three pillars).
This guide is written for CTOs, founders, and product-driven operators running business-critical Laravel systems. The goal is practical: what to instrument first, how to wire it into Laravel, and how to avoid the most common traps that make “monitoring” feel like noise.
Monitoring is usually about known failure modes. You set alerts on CPU, error rate, queue depth, and you get paged when thresholds break.
Observability is about answering new questions under pressure, including the ones you did not predict:
Why did checkout slow down only for EU tenants?
Why are retries spiking only for one Stripe webhook event type?
Which deployment introduced the N+1 query that is now melting the database?
In a mature SaaS, the unknowns are constant. New tenants, new integrations, new data, new load patterns. Observability is what keeps that complexity from turning into operational chaos.
You want all three pillars because each answers a different class of question.
Pillar | Best for answering | Typical SaaS examples | What it misses if used alone |
|---|---|---|---|
Logs | “What happened?” (event details) | Payment failed, permission denied, webhook signature invalid | Hard to aggregate trends, hard to quantify impact |
Metrics | “How bad is it?” (rates, percentiles, saturation) | p95 latency, error rate, queue depth, DB connections | Can’t explain the exact why without context |
Traces | “Where did time go?” (end-to-end causality) | Request -> DB -> cache -> HTTP call -> queue job | Sampling can hide rare issues, needs good instrumentation |
If you only pick one, logs are usually where teams start. But most SaaS incidents resolve faster when you can pivot from a metric spike to a trace, then use logs for the details.
A reasonable baseline for a production Laravel SaaS usually includes:
Centralized structured logs (JSON), with consistent context (tenant, request ID, user ID) and a clear policy for PII.
Core service metrics: request rate, error rate, latency percentiles, queue depth, job failure rate, database saturation, cache hit ratio if relevant.
Distributed traces across HTTP requests, queue jobs, and external calls (Stripe, QuickBooks, Google APIs, internal services).
Release markers so you can correlate “things got worse” with deployments.
Alerting that maps to user impact, plus runbooks.
If this feels like a lot, start with what reduces mean time to resolution (MTTR) for your most expensive failures: broken checkout, broken onboarding, delayed processing, data integrity bugs.
Laravel’s logging is powered by Monolog. Out of the box it’s easy to write log lines, but “easy to write” is not the same as “easy to use at 2 a.m.”
Plain text logs become painful the moment you need to filter by tenant, correlate a background job with an HTTP request, or group failures by integration.
A pragmatic pattern is:
Output logs as JSON.
Ensure every log line includes a request ID and (for SaaS) a tenant identifier.
Log events, not novels: aim for high-signal records that are easy to query.
Laravel supports adding shared context via Log::withContext() (see the Laravel logging documentation). A common approach is to add middleware that attaches request-scoped context.
// app/Http/Middleware/LogContext.php
namespace App\Http\Middleware;
use Closure;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Str;
class LogContext
{
public function handle($request, Closure $next)
{
$requestId = $request->headers->get('X-Request-Id') ?? (string) Str::uuid();
// Put it on the response too so support can ask customers for it.
$response = $next($request);
$response->headers->set('X-Request-Id', $requestId);
Log::withContext([
'request_id' => $requestId,
'tenant_id' => optional($request->user())->tenant_id,
'user_id' => optional($request->user())->id,
'route' => optional($request->route())->getName(),
]);
return $response;
}
}
The point is not the exact fields, it’s that your whole team agrees on a small, stable set.
If you train your system to scream, people stop listening.
info: important business events (subscription canceled, import started).
warning: recoverable problems (retrying webhook processing).
error: failed operations that require attention.
critical: data integrity risk, security failures, customer-impacting outage.
Good logs are also event-shaped. Prefer:
Log::warning('stripe.webhook.signature_invalid', [
'stripe_event_id' => $eventId,
'source_ip' => $ip,
]);
over:
Log::warning("Stripe webhook signature invalid for event {$eventId} from {$ip}");
The first is queryable and consistent.
This is where teams get burned, especially when debugging auth, payments, and integrations.
Practical policies:
Never log raw access tokens, API keys, passwords.
Be cautious with request bodies and headers (authorization headers are a classic leak).
Treat emails, phone numbers, and addresses as PII unless you have a clear retention and access policy.
For regulated workflows, enforce this with code review rules and automated scanning where feasible.
Laravel Telescope is excellent for local development and staging investigations. In production, you need to be intentional about data volume, retention, and sensitive data exposure.
If you do use it in production, do it with strict gates (auth, sampling, limited watchers) and assume it is not your primary incident tool.
Metrics answer “how bad is it?” and “is it getting worse?” They also help you catch regressions before customers file tickets.
If you’re trying to build a first-pass dashboard, start with a small set that maps to customer impact.
Area | Metric | Why it matters |
|---|---|---|
Web | Request rate + error rate | Detect broken releases and dependency failures |
Web | Latency percentiles (p50/p95/p99) | Averages hide pain, percentiles reveal it |
Queues | Queue depth and age | “Are we keeping up?” for background work |
Queues | Job success/failure rate | Detect poison messages and integration drift |
DB | Connections, CPU, slow queries | Most Laravel incidents eventually involve DB pressure |
External APIs | Error rate + latency | Stripe, QuickBooks, Google APIs become part of your system |
In Laravel, operational load is not just “requests.” It’s also:
Queue workers doing heavy lifting.
Scheduled jobs.
Webhooks.
Long-running imports.
If you only instrument web requests, your dashboards will look “fine” while customers wait 45 minutes for background processing.
Many teams land on one of these patterns:
Prometheus + Grafana for a strong open-source metrics stack.
A hosted APM platform (Datadog, New Relic, etc.) that includes metrics, traces, and logs.
The right choice depends on your org. If you already run Prometheus, add Laravel metrics there. If you need speed and operational simplicity, a hosted platform can be worth it.
Distributed tracing is the difference between:
“The app is slow”
“The app is slow because 18 percent of requests are waiting on one external API call, and the retries are saturating queue workers”
OpenTelemetry (OTel) has become the standard for generating traces and exporting them to a backend (Jaeger, Honeycomb, Datadog, New Relic, etc.). Start here: the OpenTelemetry project.
In Laravel, tracing is most valuable when it covers:
Incoming HTTP requests
Database queries (at least slow ones)
Outbound HTTP calls (Stripe, internal APIs)
Queue jobs and the chain of work they spawn
A trace is only useful if the context flows through the system:
Web request creates a trace
The queue job spawned by that request continues the trace
Subsequent outbound HTTP calls carry the trace headers
In SaaS, this becomes especially useful when a single customer action triggers multiple jobs and multiple integrations.
Auto-instrumentation is a starting point. The real payoff comes when you add spans around meaningful operations:
“calculate invoice totals”
“provision tenant”
“sync quickbooks customer”
That gives you a performance and failure map that matches how the business thinks, not just how the code is layered.
A common incident workflow looks like this:
Alert fires: elevated error rate for checkout endpoint.
Metrics confirm impact and scope: only EU region, p95 latency increased.
Traces show time is spent waiting on an external API call, plus retries.
Logs reveal the exact error type (timeouts, auth failure, bad payload) and the affected tenant(s).
Team applies mitigation: circuit breaker, reduced retry pressure, rollback, or hotfix.
If any one of these pillars is missing, you usually waste time.
Laravel applications have a few predictable areas where observability pays back quickly.
Hotspot | What to watch | Why it’s common |
|---|---|---|
Queue workers | job duration, retries, failures, dead-letter behavior | SaaS systems lean on async work and webhooks |
Webhooks | signature validation failures, idempotency conflicts, replay rate | Payment and accounting systems resend events |
Multi-tenancy | tenant_id in all signals, cross-tenant access violations | Tenant scoping bugs are high-risk |
Eloquent performance | slow queries, N+1 patterns, query count per request | ORM convenience can hide expensive access patterns |
Third-party APIs | latency and error budgets per integration | Your reliability becomes dependency reliability |
File processing | S3 errors, timeouts, large payload behavior | Imports and exports are common in B2B SaaS |
If you want a deeper risk lens on production Laravel systems, Ravenna’s perspective in Laravel code audits aligns closely with where observability gaps tend to hide.
A mature alerting setup is not “alert on everything.” It’s alert on user-impacting conditions.
A simple starting point:
Page on high error rate (for key user journeys)
Page on queue not keeping up (depth or age)
Page on sustained latency regressions (p95/p99)
Page on dependency failure (Stripe down, email provider down)
Then write down what “good” is in terms of service objectives. Even lightweight SLOs help teams make sane trade-offs.
If you cannot answer “what changed?” quickly, your incident response will be slower than it should be.
At minimum, capture:
git SHA or build ID
deployment time
feature flags toggled
Then ensure that value appears in logs and traces, and ideally as an annotation in your dashboarding tool.
Teams often start by logging everything. Costs spike, signal disappears.
Fix: decide what you need logs to do.
High-cardinality debugging details belong in traces or sampled logs.
Business events should be consistent and queryable.
Error logs should include enough context to act.
In Laravel SaaS, the real work often happens in queues. If job logs have no request ID or trace context, you lose causality.
Fix: propagate context into jobs. At a minimum, pass and log a correlation ID.
Dashboards look impressive, but nobody checks them, and alerts have no runbooks.
Fix: each alert should have:
an owner (team or person)
a first-response checklist
a link to the dashboard and trace view
If your dashboards only show infrastructure, you miss what customers experience.
Fix: add product-level signals.
Examples: signup completion rate, webhook processing delay, invoice export success rate. These are often the earliest warning signs.
If you’re starting from “we have some logs,” here’s a practical sequencing that works well for SaaS.
Add request IDs and tenant IDs to log context
Switch to structured JSON logs (or ensure your platform parses key-value logs reliably)
Create a logging policy for PII and secrets
Web: request rate, error rate, latency percentiles
Queues: depth/age, failures, duration
DB: slow query visibility and key saturation indicators
Add trace instrumentation for HTTP
Ensure outgoing HTTP calls are traced
Add a small number of custom spans for critical workflows
Create 5 to 10 high-quality alerts tied to customer impact
Add release annotations
Write short runbooks for the alerts that page humans
If you want an operational view of what “real” web app delivery includes beyond writing code, Ravenna covers this in what you really get with web app development services, including the operational maturity many teams discover they need after launch.
Do I need logs, metrics, and traces, or can I start with just one? Start with logs if you must, but you will resolve incidents faster once you have metrics and traces. Metrics tell you impact, traces show causality, and logs provide details.
Is Laravel Telescope enough for production observability? Telescope is great for debugging and staging, but it is not a full observability solution. Most SaaS teams still need centralized logging, metrics, and tracing for production.
What should I tag every log line with in a multi-tenant SaaS? At minimum: request ID, tenant ID, and a stable identifier for the actor (user ID or service account). Add route name and job name where relevant.
How do I avoid logging sensitive data in Laravel? Treat request bodies, headers, and exception contexts as dangerous by default. Define a policy, scrub known sensitive fields, and avoid dumping raw payloads from integrations.
What’s the biggest observability gap you see in Laravel SaaS apps? Missing correlation across boundaries: web request to queue jobs to third-party API calls. Without propagation, you spend time guessing instead of tracing.
How many alerts should we have? Fewer than you think. Start with a small set of high-quality alerts that map directly to customer impact, then expand only when you can respond consistently.
If your Laravel SaaS is at the stage where uptime, data integrity, and predictable delivery matter more than shipping demos, observability is not optional. It is part of the effort to reduce operational risk.
Ravenna is a senior Laravel consultancy (and official Laravel Partner) that helps teams design and evolve durable systems, including production-grade logging, metrics, tracing, and incident-ready operations. If you want a second set of senior eyes on your current setup, or a plan to get from “we have logs” to real observability, Contact Us and let's collaborate on the best path forward.