Since the release of Self-Balancing Clusters (SBC) in Confluent Platform 6.0, we have been blown away by the reception and users getting real operational benefits from it. Self-Balancing Clusters is a feature in Confluent Platform 6.0 (and greater) that automatically optimizes uneven workloads as well as topology changes (adding or removing brokers). This is done via a background process that is continuously checking a variety of metrics to determine if and when a rebalance should occur. It’s truly a “set it and forget it” type of tool and the operational benefits of not having to worry about manually monitoring and triggering partition reassignments is huge.
One of the common themes we heard from our users was that while SBC works well, they would love a bit more visibility into what SBC is actually doing at a given time. In Confluent Platform 6.2, We gave our users more visibility with the addition of the add and remove broker tasks API.
The balancer status is a new API that does exactly what it sounds like it does. It provides visibility into the state of the SBC component itself. Given that SBC relies on metrics collection to make decisions, it could take some time for SBC to ramp up the first time it’s enabled so it’s important for our users to know when SBC is ready for work, especially when relying on it for operator-initiated tasks like adding or removing brokers.
One of the most powerful features of SBC is its ability to automatically detect and act upon uneven distribution of workloads within a cluster. The even-cluster-load API provides visibility into whether a goal violation for workload distribution has been met and what SBC is currently doing about it. It provides more context in case balancing fails due to some internal error or user intervention.