Puppet's Cloud Discovery: Know What's Running in Your Cloud

The promise of automation always has been its ability to manage a wide range of tasks across all your systems, whether they're in your own data center or somewhere in the cloud. But in order to automate, you need to know what you have, and that's getting harder these days.

We've all come across orphaned cloud VMs and instances, perhaps spun up for a quick test by a developer, created as a bit of shadow IT or merely forgotten during the press of the latest product release. Regardless of why they were created and forgotten, these instances pose quite a few risks to your time, security and budget. After all, the meter's pretty much always running on cloud instances, orphaned or not.

With AWS, Google Compute Engine, Azure and others, it's easier than ever to spin up new instances and just as easy to lose track of them. It's also easy to lose track of what's running on them, if you ever fully knew at all. Automating those services and packages would help, but it's hard to automate what you don't know you have.

Puppet's new Cloud Discovery seeks to change that by giving you a clear picture of what's running on all your AWS instances—even the ones you didn't know you had. It moves beyond the cloud infrastructure and instances to show you their workloads and everything from packages and firewall rules to users and OS variants.

The idea is the more you know, the better able you are manage it with automation tools like Puppet. But if you haven't gotten around to using Puppet, don't worry. You don't need it to use Cloud Discovery. In fact, you don't need to install anything.

You start by creating a Cloud Discovery account and adding your AWS EC2 credentials (for as many accounts as you want), and the tool authenticates with the cloud service. In its first iteration, due out in Q3, Cloud Discovery will work with Amazon Web Services, but Puppet plans to add support for Google, Azure, VMware and others after that.

Once connected, Cloud Discovery uses the AWS API to gather information about your instances, including each private and public hostname, when it was launched and the listed security groups. It does this initial scan across your cloud account quickly and in real time.

From the result list, you can choose which instances to deep-scan—perhaps just instances in a certain IP range or just the instances in a certain AWS region, like west-2. When it's released, the folks at Puppet aim to give Cloud Discovery users a number of choices based on data points offered up by the AWS API.

Once the scan begins, Cloud Discovery temporarily drops a small binary on each VM and quickly gathers a broad array of specific details about each one. The main dashboard shows the number of users (and user variants), groups, the EC2 instance types, the OS platforms, services, packages and more. When the scan's done, it removes the small binary and leaves no trace (other than perhaps an SSH session log).

Figure 1. An early view of the Cloud Discovery dashboard after a scan of your AWS instances.

From the results, you can dig further into each data category to discover even more data, including the version—or versions—of packages you have installed. If you have two different versions of openssh-server running on ten different instances, you'll be able to see that and other package variants immediately.

At the same time, Cloud Discovery returns a raft of metadata—including memory, storage and IP—and offers a sort of Puppet node graph view of each instance's resources, with the instance itself displayed as the hub, and resources and services as spokes. It's a nice way to get a sense of what an instance is doing with a single glance.

Since Cloud Discovery is looking beyond just the virtual hardware of your AWS instances, it can illuminate inconsistencies. Those might be anything from outdated versions of the openssl package to dated firewall rules. It'll even tell you the ports apps are running on, so you might find you're running httpd on both 80 and 8080 on certain instances—something you might want to fix.

Of course, perhaps your greatest discovery will be those orphaned instances you never knew existed that are playing fast and loose with your security rules or silently running up cloud charges.

If you do nothing else with the information, Cloud Discovery can save you money by guiding you to unused instances, which you can terminate. But the aim really is to use what Cloud Discovery gives you to get a big head start on automating all the machines you have running in the cloud. Now that you know what you have, you can better manage it, and when you deploy your Puppet master and agents on your instances, you can better enforce everything from user and group access rights and firewall settings to package versioning and running services.

Resources

Puppet Cloud Discovery: https://puppet.com/product/cloud-discovery

Cloud-Scale Automation with Puppet: https://puppet.com/resources/ebook/geek-guide-cloud-scale-automation-puppet

______________________

John S. Tonello is Director of IT for NYSERNet, Inc., in Syracuse, New York. He's been a Linux user and enthusiast since he installed his first Slackware system from diskette 20 years ago. You can follow him @johntonello.