The way we tend to think about Observability is as follows:
Observability is an emergent property of complex systems. Without this discipline, as systems mature in complexity so too will entropy in the environment.
The discipline of Observability is fundamentally about improving software delivery and operations. This is a socio-technical domain requiring cross-functional commitment from multiple customer teams.
When implemented correctly Observability represents an amplifying function for forward-thinking Product Teams and is necessary to meet the needs of modern architecture.
To go further we must ask some fundamental questions ...
How does one describe a complex system?
Control Theory suggests that we can estimate the system’s internal state from accessible outputs. If we could simultaneously measure all internal variables, we would in fact have a complete description of a system’s state at that point in time, though this is not practical with today’s applications due to high cardinality and dimensionality, i.e., there is simply too much data to practically store and mine. Instead, we are often limited to only a subset of variables, or sensor data, to interpret our complex system.
How do we identify the optimal sensors/outputs for targeted observability?
Thankfully, the internal variables of a complex system are rarely independent of each other. In complex systems, interactions between components often induce interdependencies, so by selecting a subset of variables/data we often have sufficient information to allow us to reconstruct the system’s complete internal state, making the system observable.
How is this achieved in practice?
Advanced enterprises often achieve their goal of observability through either sampling, which provides statistically valuable insights, or by retaining higher fidelity information for a shorter period. In both cases, the data leveraged is typically MELT data (metrics, events, logs, traces) that is augmented and correlated with additional metadata (e.g., unique context IDs, software versions, service tags, change records, topology data, baggage information, etc.).
These techniques allow the practitioner to ask arbitrary questions about the system, but we often require sophisticated analysis tools and techniques to truly interpret all information.
Is correlating MELT data enough to make a complex system truly observable?
Deploying Cribl, Vector, DSP, etc. as an Observability Pipeline will not deliver observability
Deploying Dynatrace, Datadog, Honeycomb, Nobl9, etc. as Observability Platforms will not deliver observability
Deploying ServiceNow, BigPanda, etc. as Aggregation and Correlation Platforms will not deliver observability
What is required to make a complex system observable extends well beyond state representation alone. Observability is socio-technical, requiring the mindset of all participants in a system to be inquisitive and diligent. It relies on methods and processes that evaluate continuous signals to implement control as well as a continuous process improvement that is both measured and measurable. This is acquired over time due to commitment.
Below are a few tips to consider as you progress on your journey:
Consider approaches that promote vendor agnosticism for flexibility of choice
Consider your own organizational dynamics in order to develop adoption programs for sustainability
Develop Central Tendency Metrics-based programs to demonstrate overall progress back to the organization
Incorporate abstractions that are developer-friendly and developer-focused to reduce lost engineering cycles
Understand the significance of stochastic processes and how to incorporate observability into developer DNA
Emphasize telemetry collection as a commoditized function and understand the value of context propagation at the edge
Our next post will focus on some common challenges we have observed that are driving the need for Observability in the enterprise.