devops | Carlos Sanchez's Weblog

devops | Carlos Sanchez's Weblog
The micro-services vs monoliths battle is heating up. The latest munition is Amazon Prime Video article
Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%
At Adobe Experience Manager Cloud Service we are running the whole range from tiny micro-services to big Java monoliths. So I’ll try to give you my personal balanced view on the topic.
Any reasonably sized product with a bit of history is going to have a mix of micro-services and monoliths. Micro-services are not about the code, but the organization. This is the most valuable selling point. You cannot have velocity when multiple teams and lots of people making decisions and synchronizing multiple codebases. So to move fast you need some micro-services (for some definition of “micro”).
On one hand we have monoliths that are easier to understand or follow as everything is in the same place, contributed to by multiple teams. They require synchronization and locking around code, releases, tests, etc as multiple teams need to be involved. As time passes these monoliths can grow increasing the synchronization issues. But they are fast and efficient as all the calls between modules happen in-process and the overhead is minimal as much functionality is put together.
On the other hand we have micro-services that are harder to grasp as there are calls between multiple of them that are typically spread out across multiple git repos. The spreading of compute causes more inefficiencies, network latencies, more overhead as common functionality is duplicated in each micro-service, etc. But the responsibility is clearly delimited through APIs and interfaces that makes it easier to understand who is responsible and identify where problems are.
There is a lot of talk about teams owning one service, but I don’t think this is realistic. As time goes by services are developed and then move into more of a maintenance role that requires less engineering time and the team moves own to create other services that provide value. So any team will own multiple services, as (if) functionality grows.
For our teams splitting the monolith brings several benefits that steam from two: full ownership and faster iterations
the service is owned by a team
independent testing and release cycles mean faster time to market
pick the right tool/language for the job (or the prefered one by the team, not necessarily the best 😉 )
Some problems I have seen:
knowledge is limited to the owner team, there is no motivation for other teams to understand a service
duplication of efforts, multiple teams doing the same things (release management, logging, monitoring, etc). This is where Platform teams and Developer eXperience are supposed to jump in to make things easier
duplication of infrastructure and tooling around it. For example each micro-service should have their own image registry, databases, etc
interactions between services are in a bit of a limbo
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
I’ll be traveling in the following weeks, speaking at
DevOpsPro in Vilnius, Lithuania: From Monolith to Docker Distributed Applications
(May 26th)
MesosCon North America in Denver, CO: CI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
(June 1st)
Jenkins Area Meetup and Docker Boulder meetup in Boulder, CO: CI and CD at Scale: Scaling Jenkins with Docker and Apache Mesos
(June 2nd)
Open DevOps in Milan, Italy: Continuous Delivery and the DevOps Way
(June 22nd)
If you are around just ping me!
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Webinar: Scaling Jenkins with Docker and Kubernetes
Check
the video at DevOps.com
Docker is revolutionizing the way people think about applications and deployments. It provides a simple way to run and distribute Linux containers for a variety of use cases, from lightweight virtual machines to complex distributed micro-services architectures. Kubernetes is an open source project to manage a cluster of Linux containers as a single system, managing and running Docker containers across multiple Docker hosts, offering co-location of containers, service discovery and replication control. It was started by Google and now it is supported by Microsoft, RedHat, IBM and Docker Inc amongst others. Jenkins Continuous Integration environment can be dynamically scaled by using the Kubernetes and Docker plugins, using containers to run slaves and jobs, and also isolate job execution.
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Last week I participated as a panelist in the Continuous Discussions talk hosted by Electric Cloud, and
the recording is now available
. A bit long but there are some good points in there.
Some excerpts from twitter
@csanchez
: “How fast can your tests absorb your debs agility” < and your Ops, and your Infra?
@cobiacomm:
@orfjackal
says ‘hard to do agile when the customer plan is to release once per year’
@sunandaj17:
It’s not just about the tools:
#CI
is a matter of team policies & conventions) & it relies on more than 1 kind of tool
@eriksencosta:
“You can’t outsource Agile”.
@djosephsen
@cobiacomm:
biggest agile obstacles -> long regression testing cycles, unclear dependencies, and rebuilding the wheel
The panelists:
Andrew Rivers –
blog.andrewrivers.co.uk
Carlos Sanchez –
@csanchez
Chris Haddad –
@cobiacomm
Dave Josephsen –
@djosephsen
Eriksen Costa –
@eriksencosta
blog.eriksen.com.br
Esko Luontola –
@orfjackal
www.orfjackal.net
John Ryding –
@strife25
blog.johnryding.com
Norm MacLennan –
@nromdotcom
blog.normmaclennan.com
J. Randall Hunt –
@jrhunt
blog.ranman.org
Sriram Narayan –
@sriramnarayan
www.sriramnarayan.com
Sunanda Jayanth –
@sunandaj17
Hosts: Sam Fell (
@samueldfell
) and Anders Wallgren (
@anders_wallgren
) from Electric Cloud.
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Everybody should be building Docker images! but what if you don’t want to write all those shell scripts, which is basically what the
Dockerfile
is, a bunch of shell commands in
RUN
declarations; or if you are already using some Puppet modules to build VMs?
It is easy enough to build a new Docker image from Puppet manifests. For instance I have built this
Jenkis slave Docker image
, so here are the steps.
The
Devops Israel
team has built a number of
Docker images on CentOS with Puppet preinstalled
, so that is a good start.
FROM devopsil/puppet:3.5.1
Otherwise you can just install Puppet in any bare image using the normal installation instructions. Something to have into account is that Docker images are quite simple and may not have some needed packages installed. In this case the centos6 image didn’t have
tar
installed and some things failed to run. In some CentOS images
the centosplus repo needs to be enabled
for the installation to succeed.
FROM centos:centos6
RUN rpm --import https://yum.puppetlabs.com/RPM-GPG-KEY-puppetlabs && \
rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm

# Need to enable centosplus for the image libselinux issue
RUN yum install -y yum-utils
RUN yum-config-manager --enable centosplus

RUN yum install -y puppet tar
Once Puppet is installed we can apply any manifest to the server, we just need to put the right files in the right places. If we need extra modules we can copy them from the host, maybe using
librarian-puppet
to manage them. Note that I’m avoiding to run librarian or any tool in the image, as that would require installing extra packages that may not be needed at runtime.
ADD modules/ /etc/puppet/modules/
The main manifest can go anywhere but the default place is into
/etc/puppet/manifests/site.pp
. Hiera data default configuration goes into
/var/lib/hiera/common.yaml
ADD site.pp /etc/puppet/manifests/
ADD common.yaml /var/lib/hiera/common.yaml
Then we can just run
puppet apply
and check that no errors happened
RUN puppet apply /etc/puppet/manifests/site.pp --verbose --detailed-exitcodes || [ $? -eq 2 ]
After that it’s the usual Docker
CMD
configuration. In this case we call Jenkins slave jar from a shell script that handles some environment variables, with information about the Jenkins master, so it can be overriden at runtime with
docker run -e
ADD cmd.sh /cmd.sh

#ENV JENKINS_USERNAME jenkins
#ENV JENKINS_PASSWORD jenkins
#ENV JENKINS_MASTER http://jenkins:8080

CMD su jenkins-slave -c '/bin/sh /cmd.sh'
The Puppet configuration is simple enough
node 'default' {
package { 'wget':
ensure => present
} ->
class { '::jenkins::slave': }
and Hiera customizations, using
a patched Jenkins module
for this to work.
# Jenkins slave
jenkins::slave::ensure: stopped
jenkins::slave::enable: false
And that’s all, you can
see the full source code at GitHub
. If you are into Docker check out this IBM research paper
comparing virtual machines (KVM) and Linux containers (Docker) performance
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Previously:
(II) Architecture
In Maestro we typically use a Maestro master server and multiple Maestro agents. Each
Maestro Agent
is just a small service where the actual work happens, it processes the work sent by the master, via ActiveMQ, and executes the plugins with the data received.
The two main goals of the agent are
load distribution
and
heterogeneous composition support
. The more agents running, the more compositions that can be executed in parallel, and compositions can target specific agents based on its features, such as architecture, operating system,… which is a must for development environments. For simplicity each agent can only run one composition at a time, but you could have multiple agent processes running in a single server.
It uses
Puppet Facter
to gather the machine facts (operating system, memory size, cloud provider data,…) and sends all that information to the master, that can use it to filter what compositions run in the agent. For instance I may want to run a composition in a Windows agent, or in an agent that has some specific piece of software installed. Facter supports
external facts
so it is really easy to add new filtering capabilities, and not be just limited to what Facter provides out of the box. A small text file can be added to /etc/facter/facts.d/ and Facter would report it to the master server.
Agents are installed alongside with all the tools that may be needed, from Git, to clone repos, to Jenkins swarm to reuse the agents as Jenkins slaves, or mcollective agents to allow updating the agent itself automatically with Puppet when new manifests are deployed to the Puppet master. In our internal environment any commit to Puppet manifests or modules automatically trigger our
rspec-puppet
tests, the deployment of those manifests to the Puppet master, and a cascading Puppet update of all the machines in our staging environment using
MCollective
. All our Puppet modules are likewise built and tested on each commit and a new version
published to the Puppet Forge automatically
using rspec-puppet and
Puppet Blacksmith
Maestro also supports manually
assigning agents to pools
, and matching compositions with agent pools, so compositions can be limited to run in a predefined set of agents.
The agent process is written in Ruby and runs under JRuby in the JVM, thus supporting multiple operating systems and architectures, and the ability to write extensions in Java or Ruby easily. It connects to the master’s Composition Execution Engine through ActiveMQ using STOMP for messaging.
Plugins
Plugins are small pieces of code written in Java or Ruby that run in the agent to execute the actual work. We have made
all plugins available in GitHub
so they can be used as examples to create new plugins for custom tasks.
Plugins can be added to Maestro at runtime and automatically show up in the
composition editor
. The plugin manifest defines the plugin images, what tasks are defined, and what fields in each task. Based on the workload received, the agent downloads and executes the plugin, which just accesses the fields in the workload and do the actual work, whatever it might be, sending output back to LuCEE and populating the composition context.
For instance the
Fog plugin
can manage multiple clouds, such as EC2, where it can start and stop instances. The plugin receives the fields defined in the composition (credentials, image id,…), calls the EC2 API, streams the status to the Maestro output (successfully created, instance id,…) and puts some data (ids of the instances created, public ips,…) in the composition context for other tasks to use. All of that
in less than 100 lines of code
The context is important to avoid redefining field values and provide some meaningful defaults, so if you have a provision task and a deprovision task,
the values in the the latter are inherited from the former
Agent cloud manager
The agent cloud manager is a service that runs on Google Compute Engine and watches a number of Maestro installations to provide automatic agent scaling. Based on preconfigured parameters such as min/max number of agents for each agent pool, max waiting time,… and the current status of each agent pool queue, the service can start new machines from specific images, suspend them (destroy the instance but keep the disk), or completely destroy them.
We are also giving a try to Docker instead of using full vms and have created a couple interesting
Docker images on CentOS for developers
, a
Jenkins swarm slave
image and a
build agent image
that includes everything we use at development: Java, Ant, Maven, RVM (with 1.9, 2.0, 2.1, JRuby), Git, Svn, all configurable with credentials at runtime.
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Previously:
(I) Workflow
Maestro
architecture is basically defined by a master server and multiple agents, written in Java and Ruby (JRuby) for the backend and JavaScript for the frontend using AngularJS, and integrating several open source services. It is quite heterogeneous, with multiple languages, build tools, packages,… using the best tool for the job in each part of the stack.
Master
The master services include
Maestro REST API
End user web interface
Composition Execution Engine (LuCEE)
ActiveMQ
for STOMP messaging
PostgreSQL
(or MySQL)
MongoDB
Maestro REST API
The REST API is a webapp written in Java, using Spring, packaged with a Jetty server. It is documented with Swagger annotations that generate
a really nice web interface
automatically that allows trying all the operations from the browser.
It handles caching, security, based on LDAP or database records, and delegates to the Composition Execution Engine (LuCEE) typically through LuCEE REST API but also via STOMP messaging to avoid continuous polling.
It also implements handlers to execute compositions from Github, Git, SVN,… on commit callbacks.
End user web interface
The end user UI is written in
AngularJS
using the
AngularJS Bootstrap components
and
Less
stylesheets. It connects to the REST API, so everything that can be done through the webapp can also be automated using the REST API (automation, automation, automation!). I have found Angular really nice to work with
besides the service, factory, provider,… complicated abstractions
, with good modularity and the ability to reuse third party plugins.
Built with
Maven
and
Grunt
(better for the Javascript parts), using
Bower
to manage all the Javascript dependencies (angular core, bootstrap, ladda button spinner,…), and
Karma
PhantomJS
, for headless UI tests without needing a real browser.
Composition Execution Engine (LuCEE)
LuCEE is a webapp that manages the execution of compositions, sending/receiving work to/from the agents through ActiveMQ
STOMP
queues, and storing state in the PostgreSQL database. LuCEE uses the
Ruote workflow engine
for work scheduling, and manages the compositions queue and agent routing, so basically checks what compositions need to be executed and decides in what agent to execute them, based on composition requirements, free agents, and other factors ie. prioritizing previously used agents that would likely have a cached copy of sources and dependencies to speed things up.
It is written in Ruby, it was quick to implement a first version, with a simple REST API using Sinatra and a STOMP connector to send messages to the Maestro REST webapp through ActiveMQ.
It is packaged as a JRuby war with
Warbler
, and both LuCEE and the REST API wars are run in the same Jetty server, all packaged as an RPM for easier deployment.
ActiveMQ
ActiveMQ handles all the comunication between LuCEE, the REST API webapp, and the agents using multiple STOMP queues. All the comunication between LuCEE and agents such as workloads, agent output, agent status,… is sent over a queue so it can be easily scaled across a high number of agents.
LuCEE also pushes changes in the database to the REST API webapp so it can update the caches without needing continuous polling.
PostgreSQL
LuCEE uses PostgreSQL (or MySQL or any other SQL database using Ruby Datamapper) as main storage to save compositions, projects, tasks,… The SQL database is also used by the REST API webapp to store permissions and user data when not using LDAP.
MongoDB
We found that in order to do more complex dashboards and reports we needed to store all sort of unstructured data from the plugins, from run time or status to anything that a plugin developer may want such as GitHub payload data received or test stacktrace. That data is sent by the agents to LuCEE and then stored in MongoDB, and can be queried directly (all your data belong to you) or through a reporting pane in the webapp.
Next:
(III) Agents
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
At
MaestroDev
we have been building what may be called, for lack of a better name, a
DevOps Orchestration Engine
, and is long overdue to talk about
what
we have been doing there and most importantly,
how
The basics of the application is to tie together the different systems involved in a
Continuous Delivery
cycle: Continuous Integration server, SCM, build tools, packaging tools, cloud resources, notification systems,… and streamline the process through these different tools. So it hooks into a bunch of popular tools to orchestrate interactions between them, an example:
This workflow, or as we call it,
composition
, will
download a war file from a Maven repository (previously built by Jenkins)
start an Amazon EC2 instance with Tomcat preinstalled
deploy the war
checkout the acceptance tests from Git
run some tests with Maven (Selenium tests using
SauceLabs
) against that instance
wait for an user to confirm before moving to the next step (to record the human approval or to do some extra manual tests if needed)
destroy the Amazon EC2 instance
Maestro provides a nice web UI that gives visibility over the composition execution and an
aggregated log
from all the tools that run during the composition in a single place.
But the power comes with the combination of compositions together, as there are tasks for typical flows, such as running forking and joining compositions, call another composition in case of a failure, or waiting for a composition to finish.
Here we have a more complex setup with five compositions tied together.
* – A composition that calls compositions 1 and 2.
1 – A Jenkins build
2 – The acceptance tests composition mentioned before
2a – Notification composition in case the acceptance tests fail
3 – Deployment to production
So you can see that compositions are not just limited to build, test, deploy. The tasks can be combined as needed to build your specific process.
Tasks are contributed by plugins, easily written in Ruby or Java, and define what fields are needed in the UI and what to do with those fields and the composition context. Maestro includes a lot of prebuilt tasks,
publicly available on GitHub
, from executing shell scripts to Jenkins job creation or Amazon Route 53 record management, but anything.
All the tasks share a common context and use sensible defaults, so if the scm checkout path is not defined it creates a specific working directory for the composition, and that is reused by the Maven, Ant,… plugins to avoid copying and pasting the fields. That’s also how a EC2 deprovision task doesn’t need any configuration if there was a provision task before in the composition, it will just deprovision those instances started previously in the composition by default.
You can take a look at our
Maestro public instance
, showing some examples and builds of public projects, mostly Puppet modules that are automatically built and deployed to the
Puppet Forge
, and Maestro plugins build and release compositions. In next posts I’ll be talking about the technologies used and distributed architecture of Maestro.
Next:
(II) Architecture
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
A few weeks ago I’ve started to write news posts at
InfoQ
, about DevOps, or anything remotely close, that’s the good thing about DevOps meaning something different depending on who you ask 😉
I’d like to write more here too, I have some post ideas about Docker, Puppet, IoT, MQTT,… let’s see if I find the time
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Article originally published at
Agile Record
magazine Issue #17
Security Testing in an Agile Environment
. Can be
downloaded for free
as a PDF.
Security Testing Using Infrastructure-As-Code
Infrastructure-As-Code means that infrastructure should be treated as code – a really powerful concept. Server configuration, packages installed, relationships with other servers, etc. should be modeled with code to be automated and have a predictable outcome, removing manual steps prone to errors. That doesn’t sound bad, does it?
The goal is to automate all the infrastructure tasks programmatically. In an ideal world you should be able to start new servers, configure them, and, more importantly, be able to repeat it over and over again, in a reproducible way, automatically, by using tools and APIs.
Have you ever had to upgrade a server without knowing whether the upgrade was going to succeed or not for your application? Are the security updates going to affect your application? There are so many system factors that can indirectly cause a failure in your application, such as different kernel versions, distributions, or packages.
When you have a decent set of integration tests it is not that hard to make changes to your infrastructure with that safety net. There are a number of tools designed to make your life easier, so there is no need to tinker with bash scripts or manual steps prone to error.
We can find three groups of tools:
Provisioning
tools, like
Puppet
or
Chef,
manage the configuration of servers with packages, services, config files, etc. in a reproducible way and over hundreds of machines.
Virtual Machine automation
tools, like
Vagrant,
enable new virtual machines to be started easily in different environments, from virtual machines in VirtualBox or VMware to cloud providers such as Amazon AWS or Rackspace, and then provision them with Puppet or Chef.
Testing
tools, like
rspec
Cucumber
, or
Selenium
, enable unit and integration tests to be written that verify that the server is in a good state continuously as part of your continuous integration process.
Vagrant
Learning Puppet can be a tedious task, such as getting up the different pieces (master, agents), writing your first manifests, etc. A good way to start is to use Vagrant, which started as an Oracle VirtualBox command line automation tool, and allows you to create new VMs locally or on cloud providers and provision them with Puppet and Chef easily.
Vagrant projects are composed of
base boxes
, specifically configured for Vagrant with Puppet/Chef, vagrant username and password, and any customizations you may want to add, plus the configuration to apply to those base boxes defined with Puppet or Chef. That way we can have several projects sharing the same base boxes where the Puppet/Chef definitions are different. For instance, a database VM and a web server VM can both use the same base box, i.e. a CentOS 6 minimal server, and just have different Puppet manifests. When Vagrant starts them up it will apply the specific configuration. That also allows you to
share boxes and configuration files across teams
. For instance, one base box with the Linux flavor can be used in a team, and in source control we can have just the Puppet manifests to apply for the different configurations that anybody from Operations to Developers can use. If a problem arises in production, a developer can quickly instantiate a equivalent environment using the Vagrant and Puppet configuration, making a different environment’s issues easy to reproduce.
There is a list of available VMs or base boxes ready to use with Vagrant at
www.vagrantbox.es
, but you can build your own and share it anywhere. For VirtualBox they are just (big) VM files that can be easily built using
VeeWee
) or by changing a base box and rebundling it with Packer (
).
Usage
Once you have
installed Vagrant
) and
VirtualBox
) you can create a new project.
Vagrant init will create a sample Vagrantfile, the project definition file that can be customized.
$ vagrant init myproject
Then in the Vagrantfile you can change the default box settings and add basic Puppet provisioning.
config.vm.box = "CentOS-6.4-x86_64-minimal"
config.vm.box_url = "https://repo.maestrodev.com/archiva/repository/public-releases/com/maestrodev/vagrant/CentOS/6.4/CentOS-6.4-x86_64-minimal.box"

# create a virtual network so we can access the vm by ip
config.vm.network "private_network", ip: "192.168.33.13"
config.vm.hostname = "qa.acme.local"
config.vm.provision :puppet do |puppet|
puppet.manifests_path = "manifests"
puppet.manifest_file = "site.pp"
puppet.module_path = "modules"
end
In
manifests/site.pp
you can try any puppet code, i.e. create a file
node 'qa.acme.local' {
file { '/root/secret':
mode => '0600',
owner => 'root',
content => 'secret file, for root eyes only',
Vagrant up
will download the box the first time, start the VM, and apply the configuration defined in Puppet.
$ vagrant up
vagrant ssh will open a shell into the box. Under the hood, vagrant is redirecting a host port to vagrant box 22.
$ vagrant ssh
If you make any changes to the Puppet manifests you can rerun the provisioning step.
$ vagrant provision
The vm can be suspended and resumed at any time
$ vagrant suspend
$ vagrant resume
and later on destroyed, which will delete all the VM files.
$ vagrant destroy
And then we can start again from scratch with vagrant up getting a completely new vm where we can make any mistakes!
Puppet
In Puppet we can configure any aspect of a server: packages, files, permissions, services, etc. You have seen how to create a file, now let’s see an example of configuring Apache httpd server and the Linux iptables firewall to open a port.
First we need the Puppet modules to manage httpd and the firewall rules to avoid writing all the bits and pieces ourselves. Modules are Puppet reusable components that you can find at the
Puppet Forge
) or typically in GitHub. To install these two modules into the vm, run the following commands that will download the modules and install them in the
/etc/puppet/modules
directory.
vagrant ssh -c "sudo puppet module install --version 0.9.0 puppetlabs/apache"
vagrant ssh -c "sudo puppet module install --version 0.4.2 puppetlabs/firewall"
You can find more information about the
Apache
) and the
Firewall
) modules in their Forge pages. We are just going to add some simple examples to the
manifests/site.pp
to install the Apache server with a virtual host that will listen in port 80.
node 'qa.acme.local' {

class { 'apache': }

# create a virtualhost

apache::vhost { "${::hostname}.local":
port => 80,
docroot => '/var/www',
Now if you try to access this server in port 80 you will not be able to, as iptables is configured by default to block all incoming connections. Try accessing
(the ip we configured previously in the Vagrantfile for the private virtual network) and see for yourself.
To open the firewall, we need to open the port explicitly in the
manifests/site.pp
by adding
firewall { '100 allow apache':
proto => 'tcp',
port => '80',
action => 'accept',
and running vagrant provision again. Now you should see Apache’s default page in
So far we have created a virtual machine where the apache server is automatically installed and the firewall open. You could start from scratch at any time by running vagrant destroy and vagrant up again.
Testing
Let’s write some tests to ensure that everything is working as expected. We are going to use Ruby as the language of choice.
Unit
testing
with rspec-puppet
rspec-puppet
) is a rspec extension that allows to easily unit test Puppet manifests.
Create a
spec/spec_helper.rb
file to add some shared config for all the specs
require 'rspec-puppet'

RSpec.configure do |c|
c.module_path = 'modules'
c.manifest_dir = 'manifests'
end
and we can start creating unit tests for the host that we defined in Puppet.
# spec/hosts/qa_spec.rb

require 'spec_helper'

describe 'qa.acme.local' do

# test that the httpd package is installed

it { should contain_package('httpd') }

# test that there is a firewall rule set to 'accept'

it { should contain_firewall('100 allow apache').with_action('accept') }

# ensure that there is only one firewall definition

it { should have_firewall_resource_count(1) }

end
After installing rspec-puppet
gem install rspec-puppet
, you can run
rspec
to execute the tests.
...

Finished in 1.4 seconds

3 examples, 0 failures
Success!
Integration testing with Cucumber
Unit testing is fast and can catch a lot of errors quickly, but how can we check that the machine is actually configured as we expected?
Let’s use
Cucumber
), a BDD tool, to create an integration test that checks whether a specific port is open in the virtual machine we started.
Create a
features/smoke_tests.feature
file with:
Feature: Smoke tests
Smoke testing scenarios to make sure all system components are up and running.

Scenario: Services should be up and listening to their assigned port
Then the "apache" service should be listening on port "80"
Install Cucumber
gem install cucumber
and run
cucumber
. The first run will output a message saying that the step definition has not been created yet.
Feature: Smoke tests
Smoke testing scenarios to make sure all system components are up and running.

Scenario: Services should be up and listening to their assigned port # features/smoke_tests.feature:4
Then the "apache" service should be listening on port "80" # features/smoke_tests.feature:5

1 scenario (1 undefined)

1 step (1 undefined)

0m0.001s
You can implement step definitions for undefined steps with these snippets:
Then(/^the "(.*?)" service should be listening on port "(.*?)"$/) do |arg1, arg2|
pending # express the regexp above with the code you wish you had
end
So let’s create a
features/step_definitions/tcp_ip_steps.rb
file that implements our service should be listening on port step by opening a TCP socket.
Then /^the "(.*?)" service should be listening on port "(.*?)"$/ do |service, port|
host = URI.parse(ENV['URL']).host
begin
s = TCPSocket.new(host, port)
s.close
rescue Exception => error
raise("#{service} is not listening at #{host} on port #{port}")
end
end
And rerun Cucumber, this time using an environment variable URL to specify where the machine is running, as used in the step definition
URL=http://192.168.33.13 cucumber
Feature: Smoke tests
Smoke testing scenarios to make sure all system components are up and running.

Scenario: Services should be up and listening to their assigned port # features/smoke_tests.feature:4
Then the "apache" service should be listening on port "80" # features/step_definitions/tcp_ip_steps.rb:1

1 scenario (1 passed)

1 step (1 passed)

0m0.003s
Success! The port is actually open in the virtual machine.
Wash, rinse, repeat
This was a small example of what can be achieved using Infrastructure-As-Code and automation tools such as Puppet and Vagrant combined with standard testing tools like rspec or Cucumber. When a continuous integration tool like Jenkins is thrown into the mix to run these tests continuously, the result is an automatic end-to-end solution that tests systems as any other code, avoiding regressions and enabling
Continuous Delivery
) – automation all the way from source to production.
A more detailed example can be found in my
continuous-delivery project at GitHub
).
0.000000
0.000000
Share this:
Share on Bluesky (Opens in new window)
Bluesky
Share on X (Opens in new window)
Share on LinkedIn (Opens in new window)
Share on Mastodon (Opens in new window)
Mastodon
Share on Facebook (Opens in new window)
Share on Reddit (Opens in new window)
Reddit
Email a link to a friend (Opens in new window)
Email
Print (Opens in new window)
More
Share on Pinterest (Opens in new window)
Pinterest
Share on Tumblr (Opens in new window)
Tumblr
Share on Telegram (Opens in new window)
Telegram
Share on Threads (Opens in new window)
Threads
Share on WhatsApp (Opens in new window)
WhatsApp
Like
Loading...
Feed
License
Recent Comments
sadsadsd on
Building Docker Images with Ka…
Carlos Sanchez
on
Running a JVM in a Container W…
Salaikumar
on
Running a JVM in a Container W…
Ford Lady
on
Como enviar un coche de USA a…
José Luis on
Como enviar un coche de USA a…
Recent Posts
Building a TripIt Visualizer in a few hours with Google Antigravity
Managing the Machine: A Practical Look at Google Antigravity
Self-Healing Rollouts: Automating Production Fixes with Agentic AI and Argo Rollouts
Monoliths vs micro-services, here we go again
Serverless Jenkins Pipelines with Google Cloud Run
Categories
development
(254)
ai
(3)
cloud
(23)
devops
(43)
eclipse
(26)
Java
(86)
Maven
(76)
ruby
(3)
docker
(20)
General
(46)
jenkins
(23)
kubernetes
(20)
maestrodev
(52)
ONess
(11)
Personal
(38)
Uncategorized
(5)
Archives
January 2026
October 2025
May 2023
June 2021
October 2020
September 2020
June 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
June 2018
April 2018
February 2018
September 2017
August 2017
May 2017
January 2017
December 2016
November 2016
October 2016
September 2016
July 2016
May 2016
April 2016
November 2015
October 2015
December 2014
October 2014
August 2014
July 2014
June 2014
April 2014
February 2014
January 2014
November 2013
October 2013
September 2013
August 2013
April 2013
January 2013
September 2012
August 2012
June 2012
May 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
August 2011
July 2011
June 2011
May 2011
March 2011
February 2011
January 2011
November 2010
October 2010
May 2010
January 2010
November 2009
October 2009
August 2009
July 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
Carlos Sanchez's Weblog
Blog at WordPress.com.
Subscribed
Carlos Sanchez's Weblog
Already have a WordPress.com account?
Log in now.
Carlos Sanchez's Weblog
Subscribed
Report this content
View site in Reader
Manage subscriptions
Collapse this bar
Loading Comments...
%d