Google Cloud Platform Blog: Developer Tools & Insights

Repairing network hardware at scale with SRE principles

Wednesday, August 1, 2018

type Linecard interface {
  Online() error
  Offline() error 
  Status() error
}

The error qualifier in Go simply means that the function returns an error object if it fails. The underlying code implementing this interface for a Juniper line card varies significantly from implementation on the Cisco line card, but the caller of the function is insulated from the implementation. The upper level code imports the library, and when it operates on a line card, it can only perform one of those three actions we specified above.

We then realized that we could apply the same interface to many hardware components—for example, a fan. For certain vendors, the Online() and Offline() functions did nothing, because those vendors didn't support turning a fan off, so we just used the interface to check the status.

type Fan interface {
  Online() error
  Offline() error 
  Status() error
}

Building upon this line of thought, we realized that we could generalize this interface to define a common interface for all hardware components within a device.

type Component interface {
  Online() error
  Offline() error 
  Status() error
}

By structuring the code this way, anyone can add a device from a new vendor. Moreover, anyone can add any type of new component as a library. Once the library implements this common interface, it can be registered as a handler for that specific vendor and component.

Deciding what to automate

The system needed to interact with humans at various stages of the automation. To decide what to automate, we drew a flow chart of the normal human-based repair sequence and drew boxes around stages we believed we could replace with automation. We used the task of replacing a vendor control plane board as an example. Many of the steps have self-explanatory names, but these are definitions of some of the more complex ones:

Determine control plane: Find faulty control plane unit.
Determine state: Is it the master or the backup?
Copy image to control plane: Copy the appropriate software image to the master control plane.
Offline control plane: Send the backup control plane offline.
Toggle mastership: Make the replaced control plane the new master.

Figure 1: Manual workflow for replacing a vendor control plane board

When we needed to carry out this workflow, a Google network engineer performed each step in Figure 1, with the exception of pulling out and replacing the failed control plane, which was performed by someone on-site at a data center location.

Once we had defined this task, we created an automated workflow. The goal of the new system was to provide a UI for our hardware engineers in a data center that allowed them to perform one of those operations at a specific time under specific conditions and with various automated safety checks, followed by an entire device audit at the end of the operation. Previously, a human had performed all of these steps, but now a human only needed to perform the step “hardware gets replaced” in Figure 2—the hardware replacement.

Figure 2: Automated workflow for replacing a vendor control plane board

Automation, before and after

Figure 3: High-level system view.

You can see in Figure 3 what the system looked like after automation. Before automating this workflow, there would have been a lot of manual work. When an alert initially came in, an engineer would have stopped traffic to the device, and offlined by hand the bad component. Our network operations center (NOC) team would then work with the vendor—for example, Juniper or Cisco— to get a replacement part on-site. Next, we would file a change request in our change management system, noting the date of the operation.

On the day of the operation:

The data center technician would click “start” on the change management system to begin the repair.
Our system picks up this change and is ready to begin the repair.
The technician clicks “start” on our UI.
An “offline” state machine starts proceeding through the various steps to take the component offline safely.
The UI notifies the user each step of the way.
Once the state machine has completed, it notifies the technician, who can safely replace the component.
Once the component is replaced and re-cabled, the technician returns to the UI and begins the “online” state machine, which safely returns the component into production.

When we reviewed our original automation design, we noticed there would be a lot of work involved in building the various systems needed to implement the automated workflow. To facilitate collaboration, we created ticket items for each component of the system, so multiple engineers could work on the project in parallel.

Automation lessons learned

We used an iterative approach in our planning and execution. We first focused on replacing the line card for one vendor, then moved on to multiple vendors and multiple components. Due to the modular design of the code base and the interacting systems, adding more modules and scaling the code horizontally was easy.

For example, adding a new library that handled fan replacements meant simply creating the code to handle this and ensuring it implemented the above interface. Then it registered itself in the main function.

We had the option to extend or repurpose existing automation systems owned by our software management teams to meet our needs. We had to carefully consider whether to use those systems or build our own, potentially duplicating work if we chose the latter. Ultimately, we built our own automation because the other systems were understaffed. Trying to extend their tools would have disrupted other teams' project work and delayed our own project.

What worked well

Leveraging multiple engineers to automate our internal part of the workflow allowed us to take the project from design to implementation within a short period—about one year.

What didn’t

We haven't yet fully automated our hardware replacement workflow. Doing so involves troubleshooting hardware issues with vendors and persuading them that each individual failure merits a device or component replacement. We work around this gap in our automation by keeping spares on site for use with our repair automation, and handling the vendor workflow portion of the process separately and mostly manually through our NOC. We are currently working toward a fully automated vendor interaction with our vendor partners.

Measuring automation success

We can measure the hours our automation saves engineers using Google's production change logging service, which all internal tools use to record changes made to the production environment. The service logs changes made by tools manually invoked by engineers as well as tools that provide end-to-end automation without manual input. Thus we can compare how long each network repair action used to take when performed manually vs. the number of repair actions that are undertaken by today's fully automated system. These two data sets allow us to calculate the total time savings from automation. As shown in Figure 4, network hardware repair automation saves us hundreds of hours every month.

Tips for reducing toil through automation

While strategies for eliminating toil must be tailored to your individual environment and use cases, some approaches are universal. Based upon our own experience eliminating toil by automating network repair tasks, we recommend the following:

Measure your toil.
Tackle the biggest sources of toil first, and don't try to solve all problems at once.
Carefully consider whether to enhance existing tools or build new ones. Even if you can partially repurpose another team's work, would creating a tool from scratch actually make more sense cost- or resource-wise?
Take a design-driven approach. Iterate on the design, starting small and iterating quickly. Don't try to design the perfect approach from the start.
Measure your time savings to determine your return on investment.

Automation has proved useful for our team of network site reliability engineers at GCP. Learn more about the practice of SRE and how you might apply its principles to your own network projects.

By James O’Keeffe, Senior Site Reliability EngineerGoogle Cloud PlatformSRE principles are prescriptive Building the automation interface “In the old way of doing things, we treat our servers like pets, for example, Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.”
- Randy BiasPets vs. cattlePet:

An individual device you work on. You're familiar with all of its particular failure modes.

When it gets sick, you come to the rescue.

Cattle:

A fleet of devices with a common interface.

You manage the "herd" of devices as a group.

The common interface lets you perform the same basic operations on any device, regardless of its manufacturer.

Before we moved to automating network hardware failure resolution, we were stuck handling our networking equipment like pets, with an eye toward what made it unique, rather than as cattle, with an eye toward what made it a commodity. We needed to make it easier not to custom-manage all these networking devices. Our initial automation design aimed to turn our fleet into cattle by providing a common interface for interacting with networking equipment. Specifically, we used the underlying primitives to implement a higher-level interface for performing common operations—in this case, the basic operations of a line card in a network device, regardless of vendor: "Bring it online," "Take it offline" and "Check the status." We defined the following interface for a line card, using the Go programming language.

type Linecard interface { Online() error Offline() error Status() error }errortype Fan interface { Online() error Offline() error Status() error }type Component interface { Online() error Offline() error Status() error } Deciding what to automate

Determine control plane: Find faulty control plane unit.
Determine state: Is it the master or the backup?
Copy image to control plane: Copy the appropriate software image to the master control plane.
Offline control plane: Send the backup control plane offline.
Toggle mastership: Make the replaced control plane the new master.

Figure 1: Manual workflow for replacing a vendor control plane board

Figure 2: Automated workflow for replacing a vendor control plane board

Automation, before and after

Figure 3: High-level system view.

The data center technician would click “start” on the change management system to begin the repair.
Our system picks up this change and is ready to begin the repair.
The technician clicks “start” on our UI.
An “offline” state machine starts proceeding through the various steps to take the component offline safely.
The UI notifies the user each step of the way.
Once the state machine has completed, it notifies the technician, who can safely replace the component.
Once the component is replaced and re-cabled, the technician returns to the UI and begins the “online” state machine, which safely returns the component into production.

Automation lessons learned

What worked well

Leveraging multiple engineers to automate our internal part of the workflow allowed us to take the project from design to implementation within a short period—about one year.

What didn’t

Measuring automation success

Tips for reducing toil through automation

Measure your toil.
Tackle the biggest sources of toil first, and don't try to solve all problems at once.
Carefully consider whether to enhance existing tools or build new ones. Even if you can partially repurpose another team's work, would creating a tool from scratch actually make more sense cost- or resource-wise?
Take a design-driven approach. Iterate on the design, starting small and iterating quickly. Don't try to design the perfect approach from the start.
Measure your time savings to determine your return on investment.

Automation has proved useful for our team of network site reliability engineers at GCP. Learn more about the practice of SRE and how you might apply its principles to your own network projects.

Access Google Cloud services, right from IntelliJ IDEA

Tuesday, July 31, 2018

At this point, you might simply try to run the application by navigating to the main method and clicking the play button:

… and configuring the input arguments to translate some text from English to French by editing the newly created run configuration:

Run the program again, and this time you get the following error:

As you may have already guessed, you’re missing authentication rights to access the Cloud Translation API from your local machine. To overcome this, you’d normally have to go through the following steps:

Enable the service on your Google Cloud Platform (GCP) project
Create a new service account with the appropriate roles for accessing the service
Update your local run configuration with the necessary environment variables to access the service

Thankfully, the Cloud Tools for IntelliJ plugin can help. In IntelliJ, navigate to the Cloud Tools menu item under “Tools > Google Cloud Tools > Add Cloud libraries …”:

Select the Cloud Translation API and your GCP project, and click “Add Cloud Libraries”:

In the confirmation window that appears, you can see that Cloud Tools for IntelliJ takes care of enabling the API and creating the service account for you:

Lastly, select the run configuration that you created earlier so that the plugin can inject the necessary environment variables for accessing the Cloud Translation service from your local machine:

Run the program again and your input text is successfully translated from English to French using the Cloud Translation service:

The Cloud Tools for IntelliJ plugin also assists with the following:

Adding Java client libraries to your Maven pom.xml if they are not already present
Writing a Bill of Materials (BOM) to your pom.xml to help avoid dependency version conflicts
Detecting and acting on potential misconfigurations, including a missing BOM, through pom.xml file inspections with quick-fixes

The Cloud Tools for IntelliJ plugin provides many more features to help optimize your development workflow including support for Google App Engine, Stackdriver Debugger, Cloud Repositories, and Cloud Storage. For more information and to leave feedback please visit the official documentation and GitHub pages:

Cloud Tools for IntelliJ:

By Etan Shaul, Software EngineerIntelliJCloud Tools for IntelliJ pluginExample: Using the Cloud Translation API with the Cloud Tools for IntelliJ pluginCloud Translation APIquickstartexample Cloud Translation projectgit clone https://github.com/GoogleCloudPlatform/java-docs-samples.git

Enable the service on your Google Cloud Platform (GCP) project

Create a new service account with the appropriate roles for accessing the service

Update your local run configuration with the necessary environment variables to access the service

Adding Java client libraries to your Maven pom.xml if they are not already present

Writing a Bill of Materials (BOM) to your pom.xml to help avoid dependency version conflicts

Detecting and acting on potential misconfigurations, including a missing BOM, through pom.xml file inspections with quick-fixes

Cloud Tools for IntelliJ:

cloud.google.com/intellij

github.com/GoogleCloudPlatform/google-cloud-intellij

Google Cloud and GitHub collaborate to make CI fast and easy

Thursday, July 26, 2018

By Melody Meckfessel, Vice President, Engineering, Google CloudCloud Build Continuous Integration drives developer productivity “Continuous integration is a crucial element of modern software development, but historically one that has required development teams to invest significant effort in patching together disparate software products and services to build a working, streamlined pipeline. This is an area where partners with adjacent offerings can add real value by pre-integrating the necessary pieces to deliver a seamless experience. This is what GitHub and Google have set out to do.”
- Rachel Stephens, Analyst, RedMonkopen technologies Cloud Build and GitHub, better together “GitHub is excited to partner with Google to make CI for cloud-native application development painless. The ability to use Cloud Build for CI as a part of the GitHub workflow is just the start of this partnership and we look forward to building more in the future with Google.”
- Jason Warner, SVP of Technology at GitHub (read more in GitHub’s blog post)Zero-config Docker builds:Scalability:Security:Flexibility:Insights: Join usavailablefeedback

Accelerating software teams with Cloud Build

Thursday, July 26, 2018

By Melody Meckfessel, Vice President, Engineering, Google Cloud Cloud Build "We found Cloud Build to be feature rich yet also easy to learn and use. We use its parallelization and caching capabilities to speed up our container builds, and leverage its container analysis API to bless our images. Its reliability has allowed us to focus our attention on other areas."
- Riley Shott, Production Engineer at ShopifyScalability:Flexibility:Cloud Source RepositoriesecosystemSecurity:full lifecycle problemRob PikeCloud Build

Cloud Services Platform: bringing the best of the cloud to you

Tuesday, July 24, 2018

“We needed a consistent platform to deploy and manage containers on-premise and in the cloud. As Kubernetes has become the industry standard, it was natural for us to adopt Kubernetes Engine on GCP to reduce the risk and cost of our deployments.”
- Dinesh KESWANI, Global Chief Technology Officer at HSBC

Cloud Services Platform is technologically and architecturally aligned with the joint hybrid cloud products we've been developing and bringing to market with our partner, Cisco, with whom we have been collaborating closely. Our joint solution, Cisco Hybrid Cloud Platform for Google Cloud, will be generally available next month and is now certified to be consistent with Kubernetes Engine, enabling GCP out of the box.

Today, let’s take a look at aspects of the Cloud Services Platform, and how it lays a foundation for a fully realized cloud infrastructure.

Modernizing application architecture with Istio

Last year, we took a step toward helping organizations move from reactive IT management to proactive service operations—the idea of managing at a higher layer of the stack, enabling greater application awareness and control. In collaboration with several industry partners, we announced Istio, an open-source service mesh that gives operators the controls they need to manage microservices at scale. We are excited to say that open-source Istio will move to version 1.0 shortly, making it ready for production deployments.

Building on that open-source foundation, we are announcing a managed Istio service that you can use to manage services within a Kubernetes Engine cluster. Managed Istio, in alpha, is an Istio-powered service mesh available in Kubernetes Engine, complete with enterprise support. Managed Istio accelerates your journey to service operations with three high-level capabilities:

Service discovery and intelligent traffic management—Managed Istio surfaces all the services running in your cluster and manages network traffic between them. Using application-level load balancing and sophisticated traffic routing for container and VM workloads, it also provides health checks, plus canary and blue/green deployments, enabling fault tolerant applications with circuit breaking and timeouts.
Secure, authenticated communications—Managed Istio offers segmentation and granular policy for endpoints, compliance and detecting anomalous behavior, and traffic encryption by default using mTLS.
Monitoring and management—Understand and troubleshoot the system of services running across Managed Istio, including integration with Stackdriver, our suite of monitoring and management tools.

It's still early days, but we are very excited about Istio and Managed Istio, foundational technologies that will drive the use of containers and microservices, while helping to make your environment much more manageable, scalable and available.

Enterprise-grade Kubernetes, wherever you go

A great path to well-managed applications is undoubtedly containers and microservices, and having a common Kubernetes management layer can help get you there that much faster. Four years ago, we released Kubernetes, and the resulting Kubernetes Engine managed service is battle-tested and growing by leaps and bounds: In 2017 Kubernetes Engine core-hours grew 9X year over year.

Today, we are excited to bring that same managed Kubernetes Engine experience to your on-premise infrastructure. GKE On-Prem, soon to be in alpha, is Google-configured Kubernetes that you can deploy in the environment of your choice. GKE On-Prem makes it easy to install and upgrade Kubernetes and provides access to the following capabilities across GCP and on-premise:

Unified multi-cluster registration and upgrade management
Centralized monitoring and logging with Stackdriver integration
Hybrid Identity and Access Management
GCP Marketplace for Kubernetes applications
Unified cluster management for GCP and on-premise
Professional services and enterprise-grade support

Now, with GKE On-Prem, you can begin to modernize existing applications on-premise, without necessarily moving to the cloud. You gain control of your journey to the cloud at your own pace.

Automatically take control of your Kubernetes workloads

When it comes to managing clusters at scale, it’s imperative to have the right security controls in place and ensure your policies can be easily managed and enforced. Today, we’re pleased to announce GKE Policy Management which delivers centralized capabilities that make it far easier for administrators to configure Kubernetes (wherever it may be running).

With GKE Policy Management, Kubernetes administrators create a single source of truth for their policies that automatically syncs with any enrolled cluster. GKE Policy Management supports policies stored as definitions in a repository, and can also use your existing Google Cloud IAM policies to make it simple to secure your clusters. GKE Policy Management is coming soon to alpha; sign up here to express interest.

A service-centric view of your environment

More than simply making it easier to migrate workloads to the cloud, the technologies found in Cloud Services Platform lay the groundwork for improving service operations, by providing administrators with a service-centric view of their infrastructure, rather than infrastructure views of services. Today, we are announcing Stackdriver Service Monitoring, which provides the following new views:

Service graph: A real-time bird’s-eye visualization of the entire environment—see all your microservices, how they communicate, and their dependencies.
Service level objective (SLO) monitoring: Monitor and alert in the same customer-centric, low-toil manner as Google Site Reliability Engineers (SRE) do for our own services.
Service dashboard: All your signals for a given service are in a single place so that you can debug faster and easier than ever before and lower your mean-time-to-resolution (MTTR).

Stackdriver Service Monitoring is designed for workloads running on opinionated Istio infrastructure, as well as App Engine.

When microservices become APIs

Microservices provide a simple, compelling way for organizations to accelerate moving workloads to the cloud, serving as a path towards a larger cloud strategy. Istio enables service discovery, connection and management for microservices. But as soon as those services are needed for internal groups, partners or developers outside of the enterprise, they quickly cross the line and become APIs.

Just as organizations need services management for microservices, they need API management for their APIs. Apigee API Management complements Istio with the robust features of Google Cloud's Apigee API management platform, Apigee Edge, by extending API management natively into the microservices stack. Apigee Edge features include API usage, access, productization, catalog and discovery, plus a developer portal to create a smooth experience for developers and increase API consumption.

Making cloud all it could be

Here at Google, we could never have done what we do today without containers and Kubernetes, but taking a service-oriented view of our operations has been equally critical. In addition to the core capabilities mentioned above, Cloud Services Platform provides access to other new areas of functionality:

GKE serverless add-on lets you run serverless workloads on Kubernetes Engine with a one-step deploy. You can go from source to containers amazingly fast, auto-scale your stateless container-based workloads, and even scale down to zero. Sign up for an early preview for the GKE serverless add-on here.
Knative (pronounced kay-nay-tiv), open-source serverless components from the same technology that enables the GKE serverless add-on. Knative lets you create modern, container-based and cloud-native applications by providing building blocks you need to build and deploy container-based serverless applications anywhere on Kubernetes.
Cloud Build is a fully-managed Continuous Integration/Continuous Delivery (CI/CD) platform that lets you build, test, and deploy software quickly, at scale.

Now, with Cloud Services Platform, we’re excited to bring the full potential of the cloud to you, wherever your workloads may be. For more on Cloud Services Platform, you can read about how it relates to serverless computing.

By Urs Hölzle, Senior Vice President, Technical InfrastructureCloud Services Platform

Service mesh: Availability of Istio 1.0 in open source, Managed Istio, and Apigee API Management for Istio

Hybrid computing: GKE On-Prem with multi-cluster management

Policy enforcement: GKE Policy Management, to take control of Kubernetes workloads

Ops tooling: Stackdriver Service Monitoring

Serverless computing: GKE Serverless add-on and Knative, an open source serverless framework

Developer tools: Cloud Build, a fully managed CI/CD platform

The Cloud Services Platform family

“We needed a consistent platform to deploy and manage containers on-premise and in the cloud. As Kubernetes has become the industry standard, it was natural for us to adopt Kubernetes Engine on GCP to reduce the risk and cost of our deployments.”
- Dinesh KESWANI, Global Chief Technology Officer at HSBC Modernizing application architecture with IstioIstio, an open-source service mesh

Service discovery and intelligent traffic management—Managed Istio surfaces all the services running in your cluster and manages network traffic between them. Using application-level load balancing and sophisticated traffic routing for container and VM workloads, it also provides health checks, plus canary and blue/green deployments, enabling fault tolerant applications with circuit breaking and timeouts.

Secure, authenticated communications—Managed Istio offers segmentation and granular policy for endpoints, compliance and detecting anomalous behavior, and traffic encryption by default using mTLS.

Monitoring and management—Understand and troubleshoot the system of services running across Managed Istio, including integration with Stackdriver, our suite of monitoring and management tools.

Enterprise-grade Kubernetes, wherever you goKubernetes Engine managed service is battle-testedGKE On-Prem

Unified multi-cluster registration and upgrade management

Centralized monitoring and logging with Stackdriver integration

Hybrid Identity and Access Management

GCP Marketplace for Kubernetes applications

Unified cluster management for GCP and on-premise

Professional services and enterprise-grade support

Automatically take control of your Kubernetes workloadssign up here A service-centric view of your environment

Service graph: A real-time bird’s-eye visualization of the entire environment—see all your microservices, how they communicate, and their dependencies.

Service level objective (SLO) monitoring: Monitor and alert in the same customer-centric, low-toil manner as Google Site Reliability Engineers (SRE) do for our own services.

Service dashboard: All your signals for a given service are in a single place so that you can debug faster and easier than ever before and lower your mean-time-to-resolution (MTTR).

When microservices become APIsApigee API Management Making cloud all it could be

GKE serverless add-on lets you run serverless workloads on Kubernetes Engine with a one-step deploy. You can go from source to containers amazingly fast, auto-scale your stateless container-based workloads, and even scale down to zero. Sign up for an early preview for the GKE serverless add-on here.

Knative (pronounced kay-nay-tiv), open-source serverless components from the same technology that enables the GKE serverless add-on. Knative lets you create modern, container-based and cloud-native applications by providing building blocks you need to build and deploy container-based serverless applications anywhere on Kubernetes.

Cloud Build is a fully-managed Continuous Integration/Continuous Delivery (CI/CD) platform that lets you build, test, and deploy software quickly, at scale.

serverless computing

Bringing the best of serverless to you

Tuesday, July 24, 2018

By Eyal Manor, Vice President, EngineeringApp EngineCloud Services Platform

New App Engine runtimes

Cloud Functions general availability, support for additional languages, plus performance, networking and security features

Serverless containers on Cloud Functions

GKE serverless add-on

Knative, Kubernetes-based building blocks for serverless workloads

Integration of Cloud Firestore with GCP services

Expanding serverless computerecent support for Node.js 8gVisorCloud FunctionsSLAPython 3.7Node.js 8 Serverless and containers: the best of both worldsearly previewGKE serverless add-on "The technology behind the GKE serverless add-on enabled us to focus on just the business logic, as opposed to worrying about overhead tasks such as build/deploy, autoscaling, monitoring and observability"
-Ram Gopinathan, Principal Technology Architect, T- Mobile With Knative, run your serverless workloads anywherePivotalIBMRed Hat,SAPGet startedjoin the conversation A comprehensive serverless ecosystemcomprehensive ecosystemCloud BuildStackdriver Toward ubiquitous serverless computingLast week’s launchGCP MarketplaceGoogle Cloud serverless technologies

Now shipping: ultramem machine types with up to 4TB of RAM

Wednesday, July 18, 2018

By Hanan Youssef, Product ManagerGoogle Cloud Platform

Live migration

The ability to create preemptible instances for batch workloads

Flexible, per-second billing

Support for sustained use and committed use discounts

SAP-certified for OLAP and OLTP workloadsour partnership with SAP

Maximum memory per node and per cluster for SAP HANA on GCP, over time

automated deploymentsStackdriver monitoringSAP HANA ecosystem on GCP. Up to 70% discount for committed usedocspricing calculator GCP customers have been doing exciting things with ultramem VMs "As part of our partnership with SAP and Google Cloud, we have been an early tester of Google Cloud's 4TB instances for SAP solution workloads. The machines have performed well, and the results have been positive. We are excited to continue our collaboration with SAP and Google Cloud to jointly create market changing innovations based upon SAP Cloud Platform running on GCP.”
- Javier Llinas, Vice President, GIT, Colgate Getting startedregions and zones page.SAP pageGCP Console

Introducing new Apigee capabilities to deliver business impact with APIs

Thursday, July 12, 2018

To use this feature, API developers:

Log in to Apigee Edge user interface with their credentials
Create a new API proxy, configure backend target, add policies
Add a callout policy to select the appropriate business integration process
Save and deploy the API proxy

Access Google Cloud services from the Apigee Edge user interface

API developers want to easily access and connect with Google Cloud services like Cloud Firestore, Cloud Pub/Sub, Cloud Storage, and Cloud Spanner. In each case, there are a few steps to perform to deal with security, data formats, request/response transformation, and even wire protocols for those systems.

Apigee Edge includes a new feature that simplifies interacting with these services and enables connectivity to them through a first-class policy interface that an API developer can simply pick from the policy palette and use. Once configured, these can be reused across all API proxies.

We’re working to expand this feature to cover more Google Cloud services. Simultaneously, we’re working with Informatica to include connections to other software-as-a-service (SaaS) applications and legacy services like hosted databases.

Publish business integration processes as managed APIs

Integration architects, working to connect data and applications across the enterprise, play an important role in packaging and publishing business integration processes as great API products. Working with Informatica, we’ve made this possible within Informatica’s Integration Cloud.

Integration architects that use Informatica's Integration Cloud for Apigee can now author composite services using business integration processes to orchestrate data services and applications, and directly publish them as managed APIs to Apigee Edge. This pattern is useful when the final destination of the API call is an Informatica business integration process.

To use this feature, integration architects need to execute the following steps:

Log in to their Informatica Integration Cloud user interface
Create a new business integration process or modify an existing one
Create a new service of type (“Apigee”), select options (policies) presented on the wizard, and publish the process as an API proxy
Apply additional policies to the generated API proxy by logging in to the Apigee Edge user interface.

API documentation can be generated and published on a developer portal, and the API endpoint can be shared with app developers and partners. APIs are an increasingly central part of organizations’ digital strategy. By working with Informatica, we hope to make APIs even more powerful and pervasive. Click here for more on our partnership with Informatica.

By Ed Anuff, Director of Product and Strategy, ApigeeApigeeThanks to a new partnership with Informatica Discover and invoke business integration processes with Apigee

Create a new API proxy, configure backend target, add policies

Add a callout policy to select the appropriate business integration process

Save and deploy the API proxy

Access Google Cloud services from the Apigee Edge user interface

Publish business integration processes as managed APIs

Create a new business integration process or modify an existing one

Create a new service of type (“Apigee”), select options (policies) presented on the wizard, and publish the process as an API proxy

Apply additional policies to the generated API proxy by logging in to the Apigee Edge user interface.

here

Verifying PostgreSQL backups made easier with new open-source tool

Wednesday, July 11, 2018

By Brett Hesterberg, Product Manager and Alexis Guajardo, Senior Software EngineerOne industry survey indicates PostgreSQL Page Verification ToolGoogle Cloud Platformuse Google Cloud SQL for PostgreSQLPostgreSQL Page Verification tool How the Page Verification tool works How to start using the PostgreSQL Page Verification toolGoogle Open SourceGitHub

7 best practices for building containers

Tuesday, July 10, 2018

The container on the left follows the best practice. The container on the right does not.

2. Properly handle PID 1, signal handling, and zombie processes

Get more details

Kubernetes and Docker send Linux signals to your application inside the container to stop it. They send those signals to the process with the process identifier (PID) 1. If you want your application to stop gracefully when needed, you need to properly handle those signals.

Google Developer Advocate Sandeep Dinesh’s article —Kubernetes best practices: terminating with grace— explains the whole Kubernetes termination lifecycle.

3. Optimize for the Docker build cache

Get more details

Docker can cache layers of your images to accelerate later builds. This is a very useful feature, but it introduces some behaviors that you need to take into account when writing your Dockerfiles. For example, you should add the source code of your application as late as possible in your Dockerfile so that the base image and your application’s dependencies get cached and aren’t rebuilt on every build.

Take this Dockerfile as example:

FROM python:3.5
COPY my_code/ /src
RUN pip install my_requirements

You should swap the last two lines:

FROM python:3.5
RUN pip install my_requirements
COPY my_code/ /src

In the new version, the result of the pip command will be cached and will not be rerun each time the source code changes.

4. Remove unnecessary tools

Get more details

Reducing the attack surface of your host system is always a good idea, and it’s much easier to do with containers than with traditional systems. Remove everything that the application doesn’t need from your container. Or better yet, include just your application in a distroless or scratch image. You should also, if possible, make the filesystem of the container read-only. This should get you some excellent feedback from your security team during your performance review.

5. Build the smallest image possible

Get more details

Who likes to download hundreds of megabytes of useless data? Aim to have the smallest images possible. This decreases download times, cold start times, and disk usage. You can use several strategies to achieve that: start with a minimal base image, leverage common layers between images and make use of Docker’s multi-stage build feature.

The Docker multi-stage build process.

Google Developer Advocate Sandeep Dinesh’s article —Kubernetes best practices: How and why to build small container images— covers this topic in depth.

6. Properly tag your images

Get more details

Tags are how the users choose which version of your image they want to use. There are two main ways to tag your images: Semantic Versioning, or using the Git commit hash of your application. Whichever your choose, document it and clearly set the expectations that the users of the image should have. Be careful: while users expect some tags —like the “latest” tag— to move from one image to another, they expect other tags to be immutable, even if they are not technically so. For example, once you have tagged a specific version of your image, with something like “1.2.3”, you should never move this tag.

7. Carefully consider whether to use a public image

Get more details

Using public images can be a great way to start working with a particular piece of software. However, using them in production can come with a set of challenges, especially in a high-constraint environment. You might need to control what’s inside them, or you might not want to depend on an external repository, for example. On the other hand, building your own images for every piece of software you use is not trivial, particularly because you need to keep up with the security updates of the upstream software. Carefully weigh the pros and cons of each for your particular use-case, and make a conscious decision.

Next steps

You can read more about those best practices on Best Practices for Building Containers, and learn more about our Kubernetes Best Practices. You can also try out our Quickstarts for Kubernetes Engine and Container Builder.

By Théo Chamley, Solutions ArchitectKubernetes Engine 1. Package a single application per containerGet more details

The container on the left follows the best practice. The container on the right does not.

2. Properly handle PID 1, signal handling, and zombie processesGet more detailsSandeep Dinesh’sKubernetes best practices: terminating with grace 3. Optimize for the Docker build cacheGet more detailsFROM python:3.5 COPY my_code/ /src RUN pip install my_requirementsFROM python:3.5 RUN pip install my_requirements COPY my_code/ /srcpip 4. Remove unnecessary toolsGet more detailsdistroless 5. Build the smallest image possibleGet more details

The Docker multi-stage build process.

Sandeep Dinesh’sHow and why to build small container images 6. Properly tag your imagesGet more detailsSemantic Versioning 7. Carefully consider whether to use a public imageGet more details Next stepsBest Practices for Building ContainersKubernetes Best PracticesKubernetes EngineContainer Builder