It's been way too long time since the last Metrics news! Let's fix this...
There are few reasons for this : we were preparing the new 2.0 release of the Metrics Data Platform
We often promised some features, and we actually worked on, but there was an issue. To deliver these features, we needed a cleaner stack that would have a better scaling and we needed to reduce some technical debt. We were also facing something that every team can also live : over engineering
While the platform had many capabilities, the first version of Metrics (at the time: IoT PaaS Time Series) was striving to circumvent some of the native features, to simplify it. This is why we first offered OpenTSDB as first protocol. It was simple to push and query data, integrated into Grafana and other collecting tools. Great. When when it comes to add more features, sometimes you need to accept this is time to refactor.
Then Metrics has grown with the need expressed by its customers, and at the time the majority was inside OVH. We identified different needs and issues that our customers were facing. For example, we identified that customers were dealing with series management issues : a single scollector instance, by default and given the hardware specs, can generate between 350 and 1k series for a single host when you only need few series for CPU, Memory, Disk, Network. When you manage hundreds of thousands of instances, this greed has a cost. This has led to the developement of two new tools : Noderig & Beamium that we've released in Open Source. We will introduce them in few days in our blog.
But one the major issue with OpenTSDB was that it was way too limited in its query capabilities. Customers needed to convert bits in bytes, then correlate Series with another. They needed to perform topK given arbitrary criteria, extract histogram from values distribution, apply signal processing on their Time Series like Fourrier Transform, Pattern Detection, get outliers, apply Holt-Winters and exponential smoothing to forecast or detect anomalies, process real time Dynamic Time Warping, etc.
It's a huge feature request list but... guess what : the Metrics platform already had it! We only needed to exploit the full potential and not restrict it to OpenTSDB limitations
Since from the beginning the stack is based on Warp10, an Open Source Geo Time Series® platform, we have been able to propose the Warp10 query language (WarpScript) to early alpha testers. Feedbacks have been outstanding! While WarpScript need a bit of effort to tackle, it really matters when you have a true problem to solve. WarpScript provides a unique dataflow approach of querying and it features a true programming language that can be used over your Time Series data. This will the toolkit that will power Metrics Analytics.
We're currently working on a tour to ease the learning curve for those who wants to play with. Also, we will cover some internal use of WarpScript in a future blog post.
Many of you are waiting for this! This is now available. If we open WarpScript for customers, why not opening the Warp10 protocol?
We will recommend Warp10 protocol for many reason
- More comprehensive (support for Geo Location data)
- More concise
- More efficient (no json parsing)
- Integrate best with Beamium
Of course we continue to support OpenTSDB because there will be use cases where it just works for you.
Supporting Geo Time Series® and Analytics with WarpScript, means you can query your Geo data points with Geo Fencing queries. For example : ask for devices that in the last hour were located alongside this road, in this area, but not this one.
Now we also support string values as data points. This means you can push events like upgrades, crash, bugs, etc... and print them as annotation on a graph like in Grafana. We will introduce this later with visual examples.
Prometheus & PromQL
We like the way Prometheus is helping Developers and Ops to simplify application and systems monitoring. APM is not a luxury anymore and every business should be run by now for some numbers. A good way to achieve this is by instrumenting your apps. Prometheus provide an easy way to do this by offering the same idea behind CodaHale/Dropwizard Metrics, but by exposing a /metrics HTTP endpoint that can be fetched. This is where Beamium steps in.
On the Query side, if WarpScript is found cumbersome in simple cases, PromQL can be lighter and easier to use. We're currently testing our implementation of PromQL with full protocol coverage.
I told you we needed to revamp some of our components. Here is the list :
- New offers
- New Manager
- New Token subsystem
- New API
- New management API
- New web page
- New throttling subsystem
- New documentation
- New proxies for query and ingestion
Basically, the entire management stack has been rewrote to gain agility and to be able to receive more features more often. We also reworked some technical parts like ingestion and query proxies.
The new token management has a small impact on your side since we changed their format. Old ones will be maintained for few weeks but you will have to change them. Why? The first iteration was based on synchronisation between the database where tokens are encrypted and proxies. It implies the sync to be distributed and consistent. Two key factor that we can avoid by using cryptographic tokens (macaroon like). So now when you generate a new token, it's instantly available for READ or WRITES access. You can still embed labels inside the token if you need to isolate data from different customers based on a pivot label.
We've worked on a new documentation : https://docs.ovh.com/gb/en/cloud/metrics/
There is still work to do but now the new offer is going on, it's our priority to show you what you can achieve with Metrics. If you have issues with it, tell us. If you don't get something, it's our fault and our mission to fix it
Two new products : Metrics Cloud & Metrics Live
In order to better reflect customers needs, we have extended the offer. Now we have two products :
- Metrics Cloud
- Metrics Live
Metrics Cloud is the PaaS offer : subscribe a plan, name it and create your token from the manager or from api.ovh.com, and use it. You will be able to upgrade from a plan to a higher one as your need increase. Metrics Cloud includes plans from XXS to XL, but you could need more, which is possible. Just contact us if you're concerned.
Metrics Live is an in-memory counterpart of Metrics Cloud. It's a dedicated instance of the Metrics stack that runs in-memory and that can be used to perform extremely large aggregations. Our customers use it for atomic availability to drive robots and quick business decisions, or to perform large aggregations (over millions of series) that are persisted on Metrics Cloud.
Metrics Cloud and Metrics Live are two complementary products that our users combine to benefit from both advantages : in-memory processing and long-term storage. The offer is only available in FR for now. Translation is on the way! Here is the web page.
Other posts will cover what we've also worked on and is soon to be released. As a teaser :
- Metrics Live
- Hadoop/Spark/BI integration
There is still so much to say about this new release, let's keep some details for the coming days!