SLOs: Internally at Google, our Site Reliability Engineering team (SRE) only alert themselves on customer-facing symptoms of problems, and not all potential causes. This better aligns them to customer interests, lowers their toil, frees them to do value-added reliability engineering, and increases job satisfaction. Stackdriver Service Monitoring lets you to set, monitor, and alert on SLOs. Because Istio and App Engine are instrumented in an opinionated way, we know exactly what the transaction counts, error counts, and latency distributions are between services. All you need to do is set your targets for availability and performance and we automatically generate the graphs for service level indicators (SLIs), compliance to your targets over time, and your remaining error budget. You can configure the maximum allowed drop rate for your error budget; if that rate is exceeded, we notify you and create an incident so that you can take action. To learn more about SLO concepts including error budget, we encourage you to read the SLO chapter of the SRE book.

Service Dashboard: At some point, you will need to dig deeper into a service’s signals. Maybe you received an SLO alert and there’s no obvious upstream cause. Maybe the service is implicated by the service graph as a possible cause for another service’s SLO alert. Maybe you have a customer complaint outside of an SLO alert that you need to investigate. Or, maybe you want to see how the rollout of a new version of code is going.

The service dashboard provides a single coherent display of all signals for a specific service, all of them scoped to the same timeframe with a single control, providing you the fastest possible way to get to the bottom of a problem with your service. Service monitoring lets you dig deep into the service’s behavior across all signals without having to bounce between different products, tools, or web pages for metrics, logs, and traces. The dashboard gives you a view of the SLOs in one tab, the service metrics (transaction rates, error rates, and latencies) in a second tab, and diagnostics (traces, error reports, and logs) in the third tab.

Once you’ve validated an error budget drop in the first tab and isolated anomalous traffic in the second tab, you can drill down further in the diagnostics tab. For performance issues, you can drill down into long tail traces, and from there easily get into Stackdriver Profiler if your app is instrumented for it. For availability issues you can drill down into logs and error reports, examine stack traces, and open the Stackdriver Debugger, if the app is instrumented for it.

Stackdriver Service Monitoring gives you a whole new way to view your application architecture, reason about its customer-facing behaviors, and get to the root of any problems that arise. It takes advantage of infrastructure software enhancements that Google has championed in the open source-world, and leverages the hard-won knowledge of our SRE teams. We think this will fundamentally transform the ops experience of cloud native and microservice development and operations teams. To learn more see the presentation and demo with Descartes Labs at GCP Next last week. We hope you will sign up to try it out and share your feedback.

Setting up the uptime check

Now let’s set up the website uptime check. Open the Stackdriver monitoring menu in your GCP cloud console.

In this case, we created a little web server instance with a public IP address. We want to monitor this public IP address to check the web server’s uptime. To set this up, select “Uptime Checks” from the right-side menu of the Stackdriver monitoring page.

Remember: This is a test case, so we set the check interval to one minute. For real-world use cases, this value might change according to the service monitoring requirements.

Once you have set up the Uptime Check, you can now go ahead and set up an alerting policy. Click on “Create New Policy” in the following popup window (only appears the first time you create an Uptime Check). Or you can click on “Alerting” on the left-side Stackdriver menu to set it up. Click on “Create a Policy” in the popup menu.

Setting up the alert policy

Once you click on “Create a Policy,” you should see a new popup with four steps to complete.

The first step will ask for a condition “when” to trigger the alert. This is where you have to make sure the Uptime Check is added. To do this, simply click on the “Add Condition” button.

A new window will appear from the right side:

Specify the Uptime Check by clicking on Select under “Basic Health.”

This will bring up this window (also from the right side) to select the specific Uptime Check to alert on. Simply choose “URL” in the “Resource Type” field and the “IF UPTIME CHECK” section will appear automatically. Here, we select the previously created Uptime Check.


You can also set the duration of the service downtime to trigger an alert. In this case, we used the default of five minutes. Click “Save Condition” to continue with the Alert Policy setup.

This leads us to step two:

This is where things get interesting. In order to include an external monitoring system, you can use so-called webhooks. Those are typically callouts using an HTTP POST method to send JSON formatted messages to the external system. The on-prem or third-party monitoring system needs to understand this format in order to be used properly. Typically, there’s wide support in the monitoring system industry for receiving and using webhooks.

Setting up the alerts

Now you’ll set up the alerts. In this example, we’re configuring a webhook only. You can set up multiple ways to get alerted simultaneously. If you want to get an email and a webhook at the same time, just configure it that way by adding the second (or third) method. In this example, we’ll use a free webhook receiver to monitor if our setup works properly.

Once the site has generated a webhook receiver for you, you’ll have a link you can use that will list all received tokens for you. Remember, this is for testing purposes only. Do not send in any user-specific data such as private IP addresses or service names.

Next you have to configure the notification to use a webhook so it’ll send a message over to our shiny new webhook receiver. Click on “Add Notification.”

By default a field will appear saying “Email”—click on the drop-down arrow to see the other options:

Select “Webhook” in the drop-down menu.

The system will most properly tell you that there is no webhook setup present. That’s because you haven’t specified any webhook receiver yet. Click on “Setup Webhook.”

(If you’ve already set up a webhook receiver, the system won’t offer you this option here.)

Therefore you need to go to the “select project” dropdown list (top left side, right next to the Stackdriver logo in the gray bar area). Click on the down arrow symbol (next to your project ID) and see at the bottom of the drop-down box the option “Account Settings.”

In the popup window, select “Notifications” (bottom of the left-side list under “Settings”) and then click on “Webhooks” at the top menu. Here you can add additional webhooks if needed.

Click on “Create webhook.”

Remember to put in your webhook endpoint URL. In our test case, we do not need any authentication.

Click on “Test Connection” to verify and see your first webhook appearing on the test site!

It should say “This is a test alert notification from Stackdriver.”

Now let’s continue with the Alerting Policy. Choose the newly created webhook by selecting “Webhook” as notification type and the webhook name (created earlier) as the target. If you want to have additional notification settings (like SMS, email, etc.), feel free to add those as well by clicking on “Add another notification.”

Once you add a notification, you can optionally add documentation by creating a so-called “Markdown document.” Learn more here about the Markdown language.

Last but not least, give the Alert Policy a descriptive name:

We decided to go super creative and call it “HTTP - uptime alert.” Once you have done this, click “Save Policy” at the bottom of the page.

Done! You just created your first policy. including a webhook to trigger alerts on incidents.

The policy should be green and the uptime check should report your service being healthy. If not, check your firewall rules.

Test your alerting

If everything is normal and works as expected, it is time to try your alerting policy. In order to do that, simply delete the “allow-http” firewall rule created earlier. This should result in a “service unavailable” condition for our Uptime Check. Remember to give it a little while. The Uptime Check will wait 10 seconds per region and overall one minute until it declares the service down (remember, we configured that here).

Now you’ll see that you can’t reach the nginx web server instance anymore:

Now let’s go to the Stackdriver overview page to see if we can find the incident. Click on “Monitoring Overview” in the left-side menu at the very top:

Indeed, the Uptime Check comes back red, telling us the service is down. Also, our Alerting Policy has created an incident saying that the “HTTP - uptime alert” has been triggered and the service has been unavailable for a couple of minutes now.

Let’s check the test receiver site to see if we got the webhook to trigger there:

You can see we got the webhook alert with the same information regarding the incident. This information is passed on using the JSON format for easy parsing at the receiving end. You will see the policy name that was triggered (first red rectangle), the state “open,” as well as the “started at” timestamp in Unix time format (seconds passed since 1970). Also, it will tell you that the service is failing in the “summary” field. If you had configured any optional documentation, you’d see it using the JSON format (HTTP post).

Bring the service back

Now, recreate the firewall rule to see if we get an “incident resolved” message.

Let’s check the overview screen again (remember to give it five or six minutes after the rule to react)

You can see that service is back up. Stackdriver automatically resolves open incidents once the condition restores. So in our case, the formerly open incident is now restored, since the Uptime Check comes back as “healthy” again. This information is also passed on using the alerting policy. Let’s see if we got a “condition restored” webhook message as well.

By the power of webhooks, it also told our test monitoring system that this incident is closed now, including useful details such as the ending time (Unix timestamp format) and a summary telling us that the service has returned to a normal state.

If you need to connect Stackdriver to a third-party monitoring system, webhooks is one extremely flexible way of doing this. It will let your operations team continue using their familiar go-to resources on-premises, while using all advantages of Stackdriver in a GCP (or AWS) environment. Furthermore, existing monitoring processes can be reused to bridge into the Google Cloud world.

Remember that Stackdriver can do far more than Uptime Checks, including log monitoring over source code monitoring, debugging and tracing user interactions with your application. Whether it’s alerting policy functionality, using the webhook messaging or other checks you could define in Stackdriver, all can be forwarded to a third-party monitoring tool. Even better, you can close incidents automatically once they have been resolved.

Have fun monitoring your cloud services!

Related content:

New ways to manage and automate your Stackdriver alerting policies
How to export logs from Stackdriver Logging: new solution documentation
Monitor your GCP environment with Cloud Security Command Center



The Logs Ingestion page, above, now shows last month’s volume in addition to the current month’s volume for the project and by resource type. We’ve also added handy links to view detailed usage in Metrics Explorer right from this page as well.

The Monitoring Resource Usage page, above, now shows your metrics volume month-to-date vs. the last calendar month (note that these metrics are brand-new, so they will take some time to populate). All projects in your Stackdriver account are broken out individually. We’ve also added the capability to see your projected total for the month and added links to see the details in Metrics Explorer.

3. Analyzing Stackdriver costs using the API and Metrics Explorer
If you’d like to understand which logs or metrics are costing the most, you’re in luck—we now have even better tools for viewing, analyzing and alerting on metrics. For Stackdriver Logging, we’ve added two new metrics:
  • logging.googleapis.com/billing/bytes_ingested provides real-time incremental delta values that can be used to calculate your rates of log volume ingestion. It does not cover excluded logs volume. This metric provides a resource_type label to analyze log volume by various monitored resource types that are sending logs.
  • logging.googleapis.com/billing/monthly_bytes_ingested provides your usage as a month-to-date sum every 30 minutes and resets to zero every month. This can be useful for alerting on month-to-date log volume so that you can create or update exclusions as needed.
We’ve also added a new metric for Stackdriver Monitoring to make it easier to understand your costs:
  • monitoring.googleapis.com/billing/bytes_ingested provides real-time incremental deltas that can be used to calculate your rate of metrics volume ingestion. You can drill down and group or filter by metric_domain to separate out usage for your agent, AWS, custom or logs-based metrics. You can also drill down by individual metric_type or resource_type.
You can access these metrics via the monitoring API, create charts for them in Stackdriver or explore them in real time in Metrics Explorer (shown below), where you can easily group by the provided labels in each metric, or use Outlier mode to detect top metric or resource type with the highest usage. You can read more about aggregations in our documentation.

If you’re interested in an even deeper analysis of your logs usage, check out this post by one of Google’s Technical Solutions Consultants that will show you how to analyze your log volume using logs-based metrics in Datalab.


Controlling your monitoring and logging costs
Our new pricing model is designed to make the same powerful log and metric analysis we use within Google accessible to everyone who wants to run reliable systems. That means you can focus on building great software, not on building logging and monitoring systems. This new model brings you a few notable benefits:
  • Generous allocations for monitoring, logging and trace, so many small or medium customers can use Stackdriver on their services at no cost.
    • Monitoring: All Google Cloud Platform (GCP) metrics and the first 150 MB of non-GCP metrics per month are available at no cost.
    • Logging: 50 GB free per month, plus all admin activity audit logs, are available at no cost.
  • Pay only for the data you want. Our pricing model is designed to put you in control.
    • Monitoring: When using Stackdriver, you pay for the volume of data you send, so a metric sent once an hour costs 1/60th as much as a metric sent once a minute. You’ll want to keep that in mind when setting up your monitoring schedules. We recommend collecting key logs and metrics via agents or custom metrics for everything in production; development environments may not need the same level of visibility. For custom metrics, you can write points at a smaller time granularity. Another way is to reduce the number of time series sent by avoiding unnecessary labels for custom and logs-based metrics that may have high cardinality.
    • Logging: The exclusion filter in Logging is an incredible tool for managing your costs. The way we’ve designed our system to manage logs is truly unique. As the image below shows, you can choose to export your logs to BigQuery, Cloud Storage or Cloud Pub/Sub without needing to pay to ingest them into Stackdriver.
      You can even use exclusion filters to collect a percentage of logs, such as 1% of successful HTTP responses. Plus, exclusion filters are easy to update, so if you’re troubleshooting your system, you can always temporarily increase the logs you’re ingesting.

Putting it all together: managing to your budget
Let’s look at how to combine the visibility from the new metrics with the other tools in Stackdriver to follow a specific monthly budget. Suppose we have $50 per month to spend on logs, and we’d like to make that go as far as possible. We can afford to ingest 150 GB of logs for the month. Looking at the Log Ingestion page, shown below, we can easily get an idea of our volume from last month—200 GB. We can also see that 75 GB came from our Cloud Load Balancer, so we’ll add an exclusion filter for 99% of 200 responses.

To help make sure we don’t go over our budget, we’ll also set a Stackdriver alert, shown below, for when we reach 145 GB on the monthly log bytes ingested. Based on the cost of ingesting log bytes, that’s just before we’ll reach the $50 monthly budget threshold.

Based on this alerting policy, suppose we get an email near the end of the month that our volume is at 145 GB for the month to date. We can turn off ingestion of all logs in the project with an exclusion filter like this:
logName:*

Now only admin activity audit logs will come through, since they don’t count toward any quota and can’t be excluded. Let’s suppose we also have a requirement to save all data access logs on our project. Our sinks to BigQuery for these logs will continue to work, even though we won’t see those logs in Stackdriver Logging until we disable the exclusion filter. So we won’t lose that data during that period of time.


Like managing your household budget, running out of funds at the end of the month isn’t a best practice. Turning off your logs should be considered a last option, similar to turning off your water in your house toward the end of the month. Both these scenarios run the risk of making it harder to put out fires or incidents that may come up. One such risk is that if you have an issue and need to contact GCP support, they won’t be able to see your logs and may not be able to help you.


With these tools, you’ll be able to plan ahead to help ensure you’re avoiding ingesting less useful logs throughout the month. You might turn off unnecessary logs based on use, rejigger production and development environment monitoring or logging, or decide to offload data to another service or database. Our new metrics, views and dashboards give you a lot more tools to see how much you’re spending in both resources and IT budget in Stackdriver. You’ll be able to bring flexibility and efficiency to logging and monitoring, and avoid unpleasant surprises. 


To learn more about Stackdriver, check out our documentation or join in the conversation in our discussion group.


Related content

Using structured log data has some key benefits, including making it easier to quickly parse and understand your log data. The chart below shows the differences between unstructured and structured log data. 

You can see here how much more detail is available at a glance:



Unstructured log data
Structured log data
Example from custom logs
...
textPayload: A97A7743 purchased 4 widgets.
...
...
jsonPayload: {
 "customerIDHash": “A97A7743”
 "action": “purchased”
 "quantity": “4”
 "item": “widgets”
}
...
Example from Nginx logs—now available as structured data through the Stackdriver logging agent
textPayload: 127.0.0.1 10.21.7.112 - [28/Feb/2018:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Chrome/66.0"
time:
1362020400 (28/Feb/2018:12:00:00 +0900)

jsonPayload: {
 "remote" : "127.0.0.1",
 "host"   : "10.21.7.112",
 "user"   : "-",
 "method" : "GET",
 "path"   : "/",
 "code"   : "200",
 "size"   : "777",
 "referer": "-",
 "agent"  : "Chrome/66.0"
}
 


Making structured logs work for you
You can send both structured and unstructured log data to Stackdriver Logging. Most logs Google Cloud Platform (GCP) services generate on your behalf, such as Cloud Audit Logging, Google App Engine logs or VPC Flow Logs, are sent to Stackdriver automatically as structured log data.

Since Stackdriver Logging also passes the structured log data through export sinks, sending structured logs makes it easier to work with the log data downstream if you’re processing it with services like BigQuery and Cloud Pub/Sub.

Using structured log data also makes it easier to alert on log data or create dashboards from your logs, particularly when creating a label or extracting a value with a distribution metric, both of which apply to a single field. (See our previous post on techniques for extracting values from Stackdriver logs for more information.)

Try Stackdriver Logging for yourself
To start using Stackdriver structured logging today, you’ll just need to install (or reinstall) the Stackdriver logging agent with the --structured flag. This also enables automatic parsing of common log formats, such as syslog, Nginx and Apache.

curl -sSO "https://dl.google.com/cloudagents/install-logging-agent.sh"
sudo bash ./install-logging-agent.sh --structured

For more information on installation and options, check out the Stackdriver structured logging installation documentation.

To test Stackdriver Logging and see the power of structured logs for yourself, you can try one of our most asked-for Qwiklab courses, Creating and alerting on logs-based metrics, for free, using a special offer of 15 credits. This offer is good through the end of May 2018. Or try our new structured logging features out on your existing GCP project by checking out our documentation.

Note: You can only set policy user labels via the Monitoring API.

@mentions for Slack

Slack notifications now include the alerting policy documentation. This means that you can include customized Slack formatting and control sequences for your alerts. For the various options, please refer to the Slack documentation.

One useful feature is linking to a user. So for example, including this line in the documentation field

@backendoncall policy ${policy.display_name} triggered an incident


notifies the user backend-oncall in addition to sending the message to the relevant Slack channel that was described in the policy’s notification options.

Notification examples

Now, when you look at a Stackdriver notification, all notification methods (with the exception of SMS) include the following fields:

  • Incident ID/link: the incident that triggered the notification along with a link to the incident page 
  • Policy name: the name of the configured alerting policy
  • Condition name: the name of the alerting policy condition that is in violation Email:

Email:


Slack:


Webhook:


{  
   "incident":{  
      "incident_id":"0.kmttg2it8kr0",
      "resource_id":"",
      "resource_name":"totally-new cassweb1",
      "started_at":1514931579,
      "policy_name":"Backend processing utilization too high",
      "condition_name":"Metric Threshold on Instance (GCE) cassweb1",
      "url":"https://app.google.stackdriver.com/incidents/0.kmttg2it8kr0?project=tot
ally-new",
      "documentation":{  
         "content":"CPU utilization sample. This might affect our backend
processing.\u000AFollowing playbook here: https://my.sample.playbook/cassweb1",
         "mime_type":"text/markdown"
      },
      "state":"open",
      "ended_at":null,
      "summary":"CPU utilization for totally-new cassweb1 is above the threshold of
 0.8 with a value of 0.994."
   },
   "version":"1.2"
}


Next steps

We’ll be rolling out these new features in the coming weeks as part of the regular updating process. There’s no action needed on your part, and the changes will not affect the reliability or latency of your existing alerting notification pipeline. Of course, we encourage you to give meaningful names to your alerting policies and conditions, as well as add a “documentation” section to configured alerting policies to help oncall engineers understand the alert notification when they receive it. And as always, please send us your requests and feedback, and thank you for using Stackdriver!


By default, data access logs are not displayed in this feed. To enable them from the Filter configuration panel, select the “Data Access” field under Categories. (Please note, you also need to have the Private Logs Viewer IAM permission in order to see data access logs). You can also filter the results displayed in the feed by user, resource type and date/time.

Interacting with audit logs in Stackdriver

You can also interact with the audit logs just like any other log in the Stackdriver Logs Viewer. With Logs Viewer, you can filter or perform free text search on the logs, as well as select logs by resource type and log name (“activity” for the admin activity logs and “data_access” for the data access logs).

Here are some log entries in their JSON format, with a few important fields highlighted.
In addition to viewing your logs, you can also export them to Cloud Storage for long-term archival, to BigQuery for analysis, and/or Google Cloud Pub/Sub for integration with other tools. Check out this tutorial on how to export your BigQuery audit logs back into BigQuery to analyze your BigQuery spending over a specified period of time.
"Google Cloud Audit Logs couldn't be simpler to use; exported to BigQuery it provides us with a powerful way to monitor all our applications from one place.Darren Cibis, Shine Solutions

Partner integrations

We understand that there are many tools for log analysis out there. For that reason, we’ve partnered with companies like Splunk, Netskope, and Tenable Network Security. If you don’t see your preferred provider on our partners page, let us know and we can try to make it happen.

Alerting using Stackdriver logs-based metrics

Stackdriver Logging provides the ability to create logs-based metrics that can be monitored and used to trigger Stackdriver alerting policies. Here’s an example of how to set up your metrics and policies to generate an alert every time an IAM policy is changed.

The first step is to go to the Logs Viewer and create a filter that describes the logs for which you want to be alerted. Be sure that the scope of the filter is set correctly to search the logs corresponding to the resource in which you are interested. In this case, let’s generate an alert whenever a call to SetIamPolicy is made.

Once you're satisfied that the filter captures the correct events, create a logs-based metric by clicking on the "Create Metric" option at the top of the screen.

Now, choose a name and description for the metric and click "Create Metric." You should then receive a confirmation that the metric was saved.
Next, select “Logs-based Metrics” from the side panel. You should see your new metric listed there under “User Defined Metrics.” Click on the dots to the right of your metric and choose "Create alert from metric."

Now, create a condition to trigger an alert if any log entries match the previously specified filter. To do that, set the threshold to "above 0" in order to catch this occurrence. Logs-based metrics count the number of entries seen per minute. With that in mind, set the duration to one minute as the duration specifies how long this per-minute rate needs to be sustained in order to trigger an alert. For example, if the duration were set to five minutes, there would have to be at least one alert per minute for a five-minute period in order to trigger the alert.

Finally, choose “Save Condition” and specify the desired notification mechanisms (e.g., email, SMS, PagerDuty, etc.). You can test the alerting policy by giving yourself a new permission via the IAM console.

Responding to audit logs using Cloud Functions


Cloud Functions is a lightweight, event-based, asynchronous compute solution that allows you to execute small, single-purpose functions in response to events such as specific log entries. Cloud functions are written in JavaScript and execute in a standard Node.js environment. Cloud functions can be triggered by events from Cloud Storage or Cloud Pub/Sub. In this case, we'll trigger cloud functions when logs are exported to a Cloud Pub/Sub topic. Cloud Functions is currently in alpha, please sign up to request enablement for your project.

Let’s look at firewall rules as an example. Whenever a firewall rule is created, modified or deleted, a Compute Engine audit log entry is written. The firewall configuration information is captured in the request field of the audit log entry. The following function inspects the configuration of a new firewall rule and deletes it if that configuration is of concern (in this case, if it opens up any port besides port 22). This function could easily be extended to look at update operations as well.

Copyright 2017 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

'use strict';

exports.processFirewallAuditLogs = (event) => {
  const msg = JSON.parse(Buffer.from(event.data.data, 'base64').toString());
  const logEntry = msg.protoPayload;
  if (logEntry &&
      logEntry.request &&
      logEntry.methodName === 'v1.compute.firewalls.insert') {
    let cancelFirewall = false;
    const allowed = logEntry.request.alloweds;
    if (allowed) {
      for (let key in allowed) {
        const entry = allowed[key];
        for (let port in entry.ports) {
          if (parseInt(entry.ports[port], 10) !== 22) {
            cancelFirewall = true;
            break;
          }
        }
      }
    }
    if (cancelFirewall) {
      const resourceArray = logEntry.resourceName.split('/');
      const resourceName = resourceArray[resourceArray.length - 1];
      const compute = require('@google-cloud/compute')();
      return compute.firewall(resourceName).delete();
    }
  }
  return true;
};

As the function above uses the gcloud Node.js module, be sure to include that as a dependency in the package.json file that accompanies the index.js file specifying your source code:
{
  "name" : "audit-log-monitoring",
  "version" : "1.0.0",
  "description" : "monitor my audit logs",
  "main" : "index.js",
  "dependencies" : {
    "@google-cloud/compute" : "^0.4.1"
  }
}

In the image below, you can see what happened to a new firewall rule (“bad-idea-firewall”) that did not meet the acceptable criteria as determined by the cloud function. It's important to note, that this cloud function is not applied retroactively, so existing firewall rules that allow traffic on ports 80 and 443 are preserved.

This is just one example of many showing how you can leverage the power of Cloud Functions to respond to changes on GCP.


Conclusion


Cloud Audit Logging offers enterprises a simple way to track activity in applications built on top of GCP, and integrate logs with monitoring and logs analysis tools. To learn more and get trained on audit logging as well as the latest in GCP security, sign up for a Google Cloud Next ‘17 technical bootcamp in San Francisco this March.
Share on Google+ Share on Twitter Share on Facebook

The Datalab environment also makes it possible to do advanced analytics. For example, in the included notebook, Time-shifted data.ipynb, we walk through time-shifting the data by day to compare today vs. historical data. This powerful analysis allows you to identify anomalies in your system metrics at a glance, by visualizing how they change from their historical values.

Compare today’s CPU utilization to the weekly average by zone

Stackdriver metrics, viewed with Cloud Datalab


Get started


The first step is to sign up for a 30-day free trial of Stackdriver Premium, which can monitor workloads on GCP and AWS. It takes two minutes to set up. Next, set up Cloud Datalab, which can be easily configured to run on Docker with this Quickstart. Sample code and notebooks for exploring trends in your data, analyzing group performance and heat map visualizations are included in the Datalab container.

Let us know what you think, and we’ll do our best to address your feedback and make analysis of your monitoring data even simpler for you.

Share on Google+ Share on Twitter Share on Facebook