Delorean: OpenStack packages from the future

rdo-logo-wpI have been doing some patches on Delorean lately so I thought it could be a good idea to describe it in this article as it is not a well known tool.

Delorean is a tool to build rpm packages on each commit of a set of git repositories. The goal of this tool is to detect as soon as possible packaging issues with the upstream branches of all the projects we are taking care of. The target repositories are of course the OpenStack related git repositories that we care about in RDO.

To understand how Delorean works, you must first understand how to build a package from the git repositories.

Building an rpm package from the git repositories

We need 2 git repositories to build a package:

  • the upstream git repositories with an RDO specific branch to store patches.
  • the dist-git repositories where you store the spec file and the various files needed for packaging like the files describing the services you want to start and the patches you need to apply on the sources.

The patches in the dist-git repository are generated from the RDO branch in the upstream git repository. I’ll not enter into to much detail here as in RDO there are very very few patches in the packages as the goal is to deliver the pristine upstream experience. It carries for example some patches to the OpenStack Horizon package to change the theme and add some RDO logos.

In Delorean, the RDO team does not provide any extra patch as we want to test the packaging of the upstream repositories without any modification like we do in the RDO releases so it is using directly the upstream git repositories.

How Delorean knows which packages to build

Delorean uses the rdopkg python module to get the description of the packages it needs to build. Rdopkg uses a YAML file that describes the name of the packages, where to get the git repositories from and who are the maintainers of the packages. This YAML file called rdo.yml is part of the rdoinfo git repository. Rdopkg stores a clone of this repository in the home directory of the user using Delorean under $HOME/.rdopkg/rdoinfo.

For example here is the information for the OpenStack Nova package:

It means for the nova project, we use the package configuration named core so the package name is openstack-nova, the upstream git repo is git:// and the dist-git repo is git:// The keys defined in the package-configs section can also be overridden in each project’s section if needed.

All the packages are described like that and Delorean can use these git repositories for its own needs.

Continue reading

OpenContrail on the controller side

In my previous post I explained how packets are forwarded from point to point within OpenContrail. We saw the tools available to check what are the routes involved in the forwarding. Last time we focused on the agent side but now we are going to understand on another key component: the controller..

The controller acts as a Route Reflector, announcing routes according to what has been done on the Config/API side and the peering nodes. As we saw in the dataplane post, the controller is using BGP or XMPP in order to exchange routes. XMPP is used between the controllers and the vRouters, BGP is used between the controllers and the gateway routers. One more thing about XMPP, even if it is used for route announcements this is not its only goal. It is used for some other purposes like Service-Chaining, Virtual-Machine related informations, etc.


Opencontrail(3)Taking the same example as previously I will explain how the routes are announced between the controllers and the vRouters and between the controllers and the gateway routers. I also will explain the capability of OpenContrail to extend private network outside the cloud leveraging the L3VPNs.

Ok but, what happens when I boot VM

Before continuing to explain the routing aspect within OpenContrail, I think it can be useful to explain what happens when we boot a VM and for what kind of purpose OpenContrail is using XMPP between the controllers and the vRouter.

  1. When we boot a VM, for example with Nova, the OpenContrail Nova VIF driver asks the vRouter agent for creating a vif interface, thanks to its API. The request is made with all the informations needed to create the interface (name, mac, etc) along with the VM UUID.
  2. Thanks to this VM UUID, the vRouter agent is going to subscribe to controllers for thatthis particular virtual machine.
  3. As a result of this subscription, the vRouter agent will receive all the informations (part of the Graph) related to this Virtual-Machine (Security-Group, Instance-IP, Routing-Instance, etc.).
  4. At this point the vRouter agent will be able to announce routes to the controllers about the reachability of our VM.
  5. The vRouter agent will subscribe with the controller to the Routing-Instance on which our VM is connected, so that it will receive all the routing informations related to this Routing-Instance along with the information on the Virtual-Network.

As usual OpenContrail gives you all the informations about the XMPP exchanges, thanks to the introspect interfaces. Below extracts (formatted) from the introspect interfaces when booting a VM.

For the agent side :

http://<agent ip>:8085/Snh_SandeshTraceRequest?x=XmppMessageTrace

For the controller side :

http://<controller ip>:8083/Snh_SandeshTraceRequest?x=BgpTraceBuf

From High level API resources to Network resources

To explain how the routing works within OpenContrail let’s take the same example from the previous post, two VMs on two different networks.

Capture du 2015-06-26 14:41:12

In order to interconnect the two networks, we add a virtual router between them. The interesting thing is that, in OpenContrail this router will not be really created. This is just a logical view, even if we can find it requesting the API.

Actually, the virtual router, called Logical-Router in the Opencontrail terminology, will be like translated to some other network resources. There is a special component which handles such translations – “Schema-Transformer”. The Schema-Transformer converts high level resources to network resources.

When a virtual network is created at the API side, the Schema-Transformer creates and associates a Routing-Instance to the newly created network. It dynamically allocates a unique  Route-Target (Which acts as a VNI) and associates it to the Routing-Instance.
At the API side we can check the result of the resources created by the Schema-Transformer, thus what is the Routing-Instance for a specific network, for instance for our private network.

api-routing-instanceand the Route-Target associated…

route-targetTo summarize, we have two virtual networks, each with a routing-instance with a specific Route-Target :

Virtual network Route-Target
private target:64512:8000001
service target:64512:8000002

There is currently three way of defining a Route-Target:

  1. 2 bytes of ASN, 4 bytes of value
  2. 4 bytes of ASN, 2 bytes of value
  3. 4 bytes of IP, 2 bytes of value

A route target within OpenContrail is composed of an  ASN and a number higher than 8000000 allocated by the Schema-Transformer and which is of course unique per routing-instance.

Back to our Logical Router, notified by the API, the Schema-Transformer creates a Route-Target and associates it to the newly created Logical Router.

lr-rtThis Route-Target will be associated to the Routing-Instance of each virtual network on which the Logical Router is linked to. Thanks to this Route-Target, shared across the Routing-instances the routes of both network will be leaked at the controller level.

Virtual network Route-Target
private target:64512:8000001

target:64512:8000003 (logical-router)

service target:64512:8000002

target:64512:8000003 (logical-router)

On the API side it will give us two Route-Targets for both Routing-Instances, one for the Virtual Network/Routing-Instance, another one coming from the Logical Router :

2-rtAs seen in the previous post Routing-Instance are VRFs at vRouter Agent side. Thus a logical router is finally a way to express import/export rules of routes between VRFs, so a way of leaking routes between them.

Controller time !

Now that the Schema-Transformer translated the high level resources, we should be able to find them in the controller thanks to the dedicated introspect interface.

  • http://<controller ip>:8083/Snh_ShowRoutingInstanceReq

Selecting the private routing instance we get :

  • The Route-Target imported/exported
  • The route tables

routing-instanceWe will focus on the route table inet.0 which is the route table for the Unicast IPV4 routes.
Clicking on it, we get all the routes details for this route table, ex : from where a route is learned, from os2 and os3, my two compute nodes in this setup as shown below.

ri-routes2I removed some informations from this capture since this is really a wide page, but scrolling to the right we will see that the Next-Hop is indicated as well.

As explain before we can check what are the routes received at the agent level, since it is subscribing to the Routing-Instances. Below the route leaked from the service network :

http://<agent ip>:8085/Snh_SandeshTraceRequest?x=XmppMessageTrace

Routing without virtual router

We just saw that a virtual router is just a logical view. We may want to install routing between our two networks without creating a logical router. For that we have to add the same route target to our both network, let’s say target:64512:444.
Two ways of doing that, the OpenContrail WebUI editing both networks, adding the Route-Target :

rt-webuior thanks to a command line tool, thus via the OpenContrail API :

Currently there is no way to manage Route-Targets with the Neutron API, but there is an on going work through the BGPVPN service-plugin/extension.

What about external networks ?

For the external networks, this is almost the same thing. We just need to add a Route-Target to the Routing-Instance of the “external network” on both side, the gateway router and OpenContrail. This Route-Target will take the form of an extended BGP community. The gateway routers peering with the controllers using the BGP protocol will be able to import/export routes from/to our “external network”, especially the default route for the OpenContrail side and the floating-ips for the gateway router side.

Assuming we use the Route-Target 64512:555, and we provisioned the gateway router via the OpenContrail WebUI, we can add the Route Target on the routing-instance on the router side. I’m not going to explain here the details on how to setup a gateway router, there is a pretty good explanation here. Below just an extract of a Juniper router configuration.

Now we have a gateway router with a default route for the public routing-instance, we should be able to find it on the control side.

public-inet.0Associating a floating-ip to a VM, we can check on the router side that the route for the floating-ip is correctly announced.

OpenContrail and ExaBGP for fun… troubleshooting

Since the OpenContrail controller is a BGP Speaker we can use the well known BGP Swiss army knife : ExaBGP. Adding ExaBGP as BGP peer to OpenContrail we should be able to dump all the announcements.

Below is an ExaBGP config file and a python that aim to dump all the BGP traffic in human readable way.

So we see the floating-ip and the Route-Target used for the public network (target:64512:555)


As explained in the previous post, OpenContrail relies on well known protocols and uses them for internally for the control-plane/data-plane. The ability of OpenContrail to use the same protocols internally and externally allows it to be fully integrated in Data-Centers and simplify its integration with access networks.

Python 3 Status in OpenStack Liberty

The Python 3 support in OpenStack Liberty made huge visible progress. Blocking libraries have been ported. Six OpenStack applications are now compatible with Python3: Aodh, Ceilometer, Gnocchi, Ironic, Rally and Sahara. Thanks to voting python34 check jobs, the Python 3 support can only increase, Python 2 only code cannot be reintroduced by mistake in tested code.

Progress made in OpenStack Juno and Kilo

The OpenStack port to Python 3 started slowly in 2013 during the development of OpenStack Havana. For the previous Python 3 status, read again Status of the OpenStack port to Python 3 (Cyril Roelandt, February 2014), see also: Why should OpenStack move to Python 3 right now? (Victor Stinner, December 2013).

The year 2013 was focused on porting third-party libraries and Oslo Incubator. The year 2014 was more focused on porting OpenStack clients and OpenStack common libraries. During the year 2015, we reached the interesting point: start porting OpenStack libraries and OpenStack applications. The Moving Apps to Python 3 etherpad prepared the work for the Liberty cycle.

While Python 3.3 was targeted in 2014, the new Python target version is 3.4. In parallel, Python 2.6 was dropped in more and more OpenStack client and libraries which also simplifies the port to Python 3.

setuptools, pbr and environment markers

It was not possible to have dependencies specific to Python 2 and others specific to Python 3. While dependencies like eventlet or MySQL-Python were incompatible with Python 3, it was annoying to run tests, even manually. Some dependencies were also installed on Python 2.7 whereas they were only neeeded on Python 2.6.

A first step (workaround) was to support requirements-py2.txt and requirements-py3.txt for dependencies specific to Python 2 / Python 3, and test-requirements-py2.txt and test-requirements-py3.txt for test requirements.

A lot of work was done in pip (ex: support environment markers in requirements) and pbr to support environment markers. Environment markers come from the PEP 426 (Metadata for Python Software Packages 2.0): see the Environment markers section. Examples of environment markers: sys_platform == 'win32' (detect Windows) and python_version=='2.7' (Python 2.7).

requirements-py2.txt and requirements-py3.txt were merged again into one unique requirements.txt file, and same for test requirements. The advantage of having environment markers is that it becomes possible to publish a universal wheel package which has different dependencies depending on the Python version.

OpenStack Common Libraries

Before Liberty, the port of almost all OpenStack applications was blocked by dependencies, OpenStack libraries and third-party libraries incompatible with Python 3. First, the root project oslo-incubator project was slowly ported to Python 3. In the meanwhile, this project was splitted into real libraries like oslo.config and oslo.utils. When a new library was created, we tried to ensure that the first release directly supports Python 3.


The eventlet library is used by all OpenStack applications, but it was not compatible with Python 3. Hopefully most OpenStack clients don’t use eventlet and so could be ported to Python 3.

The eventlet project was slowly ported since 2014.  In 2015, it’s now fully compatible with Python 3, since the version 0.17.3.


The mysql-python library also blocked almost all OpenStack applications.

We tried porting it to Python 3 at first, but the maintainers were not really reactive, so we thought it might be easier to just replace it with another library.

There was a discussion to replace mysql-python with mysqlclient, a fork adding Python 3 support, fixing various bugs and adding some features.

It was decided to replace mysql-python with PyMySQL instead. PyMySQL is a completely new MySQL driver written fully in Python. The main advantage is that it can be monkey-patched by eventlet to become asynchronous.

OpenStack libraries and other libraries

We ported or helped to port the following third-party libraries to Python 3 because they blocked OpenStack applications. We also pushed for releases. We also ported “OpenStack libraries”: libraries written for a specific application, like glance_store for Glance. Releases including Python 3 fixes:

  • ecdsa 0.10
  • glance_store 0.7.0: library written for Glance
  • netifaces 0.10.4
  • nose-exclude 0.4: blocked Horizon unit tests
  • os-brick 0.3.2: library written for Cinder
  • PyEClib 1.0.9
  • python-memcached 1.56: blocked keystone middleware (so all applications)
  • routes 2.2: blocked many applications (cinder, glance, keystone, neutron, etc.)
  • websockify 0.7.0

suds was replaced with suds-jurko: suds blocked cinder, oslo.vmware and nova. suds is no more maintained, whereas suds-jurko is maintained and supports Python 3.

Python 3 Status in OpenStack Liberty

Python 3 Status of OpenStack Common Libraries

All OpenStack common libraries are now compatible with Python 3: the entire test suite can run with Python 3.4. There are currently 20 libraries:

  • cliff
  • oslo.concurrency
  • oslo-incubator
  • oslo.config
  • oslo.context
  • oslo.db
  • oslo.i18n
  • oslo.log
  • oslo.messaging (*)
  • oslo.middleware
  • oslo.rootwrap
  • oslo.serialization
  • oslosphinx
  • oslotest
  • oslo.versionedobjects
  • oslo.vmware
  • oslo.utils
  • pylockfile
  • stevedore
  • taskflow

(*) The Qpid and AMQP transports of oslo.messaging are not yet compatible with Python 3. The legacy Qpid driver is deprecated, buggy and unmaintained. The AMQP driver is being ported to Python 3, it’s almost done.

Python 3 Status of OpenStack Clients

17 clients are fully compatible with Python 3, the entire test suite can run with Python 3.4:

  • keystonemiddleware
  • python-barbicanclient
  • python-ceilometerclient
  • python-cinderclient
  • python-glanceclient
  • python-heatclient
  • python-ironicclient
  • python-keystoneclient
  • python-manilaclient
  • python-marconiclient
  • python-neutronclient
  • python-novaclient
  • python-openstackclient
  • python-saharaclient
  • python-swiftclient
  • python-troveclient
  • python-tuskarclient

2 clients are being ported to Python 3:

  • python-designateclient
  • python-fuelclient (StackForge project)

Python 3 Status of OpenStack Applications

6 applications are already fully compatible with Python 3! All unit tests pass on Python 3.4. Congratulations to their maintainers.

  • Aodh
  • Ceilometer
  • Gnocchi
  • Ironic
  • Rally
  • Sahara

6 applications are being ported to Python 3 with a voting python34 check job to avoid Python 3 regressions. To be able to port the code incrementally, a subset of tests are run on Python 3 to have a working check job. More and more tests are added to the subset each time that more code is ported to Python3. Applications partially ported:

  • Neutron: 99.9% ! (7541 tests/7551, only 10 tests remain)
  • Heat: 86% (4386 tests/5119)
  • Cinder: 38% (2532 tests/6659)
  • Nova: 19% (2746 tests/14358)
  • Glance: 18% (512 tests/2779)
  • Horizon: 15% (238 tests/1605)
  • Keystone: 12% (524 tests/4318)

Keystone is still blocked by python-ldap and ldaptool. python-ldap may be replaced with pyldap, a fork compatible with Python 3.

3 applications are still at an early stage of the Python 3 port:

  • Designate
  • Manila
  • Swift

No more Python 3 regressions

In previous cycles, it was common to reintroduce code incompatible with Python 3 in files which were already ported to Python 3. We tried to avoid this issue by running unit tests on Python 3 on voting gates. When a test is run on Python 3, Python 2 only code cannot by introduced anymore in the tested code.

Some projects like Nova load more tests than the subset of tests executed on Python 3, to prevent people from adding Python 3 incompatible syntax and imports.

The Python 3 support is now also required to add a new dependency to OpenStack global requirements.

In general, more and more developers are aware of Python 3 and take it into account in their development.

What’s Next? How can you help?

Since most unit tests of Neutron and Heat are already ported to Python 3, we can expect a full support during the next Mitaka cycle. The port of other applications will also continue in parallel.

The next major task is to run functional tests on Python 3. Doug Hellmann wrote a specification Enabling Python 3 for Application Integration Tests. The specification was approved by the PTLs of the different projects. Now it’s time to implement the specification. The first step will be to add an option to DevStack to run clients and some applications on Python 3 (while other applications will still run on Python 2).

You can help to port OpenStack to Python 3 by reviewing patches, writing new patches and testing clients and applications on Python 3. The Python 3 wiki page is the central place to collaborate on this project.


We would like to thank:

  • Andrey Kurilin (Rally)
  • Cyril Roelandt (Neutron)
  • Davanum Srinivas aka dims (Nova)
  • David Stanek (Keystone)
  • George Peristerakis (Horizon)
  • Ihar Hrachyshka (Neutron)
  • Jaivish Kothari aka janonymous (Swift)
  • Jeremy Stanley who coordinated efforts on switching from MySQL-python to PyMySQL
  • Pradeep Kumar Singh (Designate)
  • Sirushti Murugesan (Heat)
  • Victor Sergeyev (Ironic)
  • Victor Stinner aka haypo (Aodh, Ceilometer, Cinder, Glance, Horizon, Nova, Swift)

There are much more developers involved in the Python 3 porting effort. Thanks to all of them!

This article was written by Cyril Roelandt and Victor Stinner.

Liberty cycle retrospective in Puppet OpenStack

Things are moving very fast in OpenStack; it might be useful to take a short break and write down a little bit of retrospective; it will help to see what happened in Puppet OpenStack project during the last months.

Continue reading

ZooKeeper part 2: building highly available applications, Ceilometer central agent unleashed

The Ceilometer project is in charge of collecting various measurements from the whole OpenStack infrastructure, including the bare metal and virtual level. For instance, we can see the number of virtual machines running or the number of storage volumes.

One of the Ceilometer components is the central agent. Basically it polls the other resources in order to get some measurements. As the name suggests, it’s central and thus it implies some obvious drawbacks in term of reliability and scalability.

In this article we will develop an application which mimic the central agent and then we will study how to improve it with ZooKeeper.

You can download all the samples from here:


The breakable architecture

Historically, the central agent looked like something like that:

Blank Flowchart - New Page

The architecture was pretty simple, it’s only one agent which polls periodically the resources to get the measurements of the OpenStack infrastructure. A resource is an OpenStack component – for instance it could be a compute node – exposing an API used to retrieve the information of interest.

The two obvious drawbacks of this architecture are:

  • We have a single point of failure because if the central agent fails then we cannot retrieve the measurements anymore.
  • We have a bottleneck because it is working alone and if the amount of resources increase dramatically then the polling mechanism will slow down.

Let’s implement this behavior in Python, the code below mimic a central agent:

The code is composed of a set of resources identified by their names and a main loop which send periodically a poll request to each resources. For the sake of simplicity, the networking stuffs with the remote resources will be dropped so that to focus on the central agent improvement.

It’s worth noting that when the performance is not an issue then we can still make the central agent highly available – without anything to develop – by using a cluster manager like Pacemaker or KeepAlived. This tool will manage a cluster of machines and monitor the agent. When the agent or the machine fails then it’s detected and it will restart the agent on another machine.

This architecture result to an active-passive cluster because at one time there is only one central agent running. Here is the documentation to have such a setup with Ceilometer.

The improved architecture

The improved architecture should remove the two drawbacks, thus we want not only one agent running but several ones which cooperate together as a team, let’s call it the Central team :-).

In order to remove the single point of failure and the bottleneck, the Central team should be composed of several agents. Each agent is assigned to a set of resources to poll so that two agents polls two distinct set of resources.

archi1 - New Page(1)

Having an agent polling a unique set of resources is a requirement because in the case of Ceilometer we don’t want to retrieve and store the same result several times in the database.
So far, so good, how the agents will cooperate ? Well, in order to implement the coordination we must answer two questions:

  • What happen if an agent leave (gracefully or from a crash) or join the Central team ?
  • How to make sure that an agent poll a unique set of resources ?

We want to have a dynamic Central team which reacts when a new member joins the team or when a member leaves the team.

More precisely, when a new member joins the team then a set of resources must be assigned to him so that each agent have the same number of resources to poll. It means that the other agents should “give” some resources to poll to the new one.

In return, when an agent leaves the team then the other should share its set of resources to poll. The idea here is that in any way the agents have the same amount of resources to poll.

Okay, it sounds cool but how an agent is notified that a member joined or left the team ? This is where ZooKeeper comes into play 😉 !

Dynamic central team membership with ZooKeeper

Thanks to ZooKeeper we will be able to detect a new member or a left member. The idea is to create a znode which represent the team, like “/central_team” and each agent will join the team by creating an ephemeral znode under that znode.

If you forgot what is an ephemeral znode then go read the part 1 of that article :-) !

Having each agent creating an ephemeral znode is not sufficient. They must listen to the events of their parent znode “/central_team” so that when an agent join the team by creating its entry under “/central_team” ZooKeeper will notify the others.

When an agent leaves the team it just has to remove its znode, if the agent crashes then ZooKeeper will detect it (because we used an ephemeral znode ;-)) and remove its entry.

In all cases, when the number of znode under “/central_team” changes then the whole team is notified.

Let’s see how to implement it on top of our little central agent:

The code is pretty straightforward, since we have a set of agents now we need to identify them so we added a unique identifier per agent.

The function _setup() is in charge of starting the Kazoo client and creating the “/central_team” znode. Before polling the resources each agent creates its ephemeral znode under “/central_team”.

Afterward, the agent retrieves the znodes under “/central_team” in order to get the current  members of the team, at the same time it sets a watcher (the method _my_watcher()) on “/central_team” in order to be notified when an event occur.

It is worth noting that when the watcher is executed, we must set the watcher again on “/central_team” because watchers are one time triggered by ZooKeeper. The best place to do it is in the watcher itself ;-).

Here is an example of the execution of two agents when agent 1 is run before agent 2:

We can see that agent 1 is polling the whole set of resources and then it received a notification when agent 2 joined the team. I suggest you to do some tests with joined agents and left agents to see how it works.
Let’s recap where we are there, so we have a team of agents that are notified when a member join or leave the team. We can see that the agent 2 polls the whole set of resources which is problematic because we want each agent to poll a distinct set of resources. This is the last issue we need to fix :-) !

Dynamic resources partitioning

There is two possible solutions to assign a unique set of resources to each agents:

  • Thanks to ZooKeeper we can elect a special agent from the team to be the leader and then it will be in charge in assigning resources to the others.
  • Or we can use a consistent hashing algorithm on the agent side…

In this article we will implement the second solution because this is what has been done in Ceilometer, the first solution is left as an exercise for the reader 😉 !

Using a consistent hashing algorithm for assigning resources to the agents is an elegant solution because given the team member list each agent can independently retrieve its unique set of resources. Explaining consistent hashing is beyond of this article but you can take a look at this explanation.

The basic idea is to hash the id of each resources and the id of each agents. Depending on the hashes we can assign a resource to an agent.

Let’s see how it works in Python, here is the added lines:

The most interesting part is the method _get_my_resources() which returns the resources that are assigned to the current agent. It’s a little bit tricky if you don’t know about consistent hashing but with some aspirins it will be clear 😉 !

Let’s see the how it runs for two agents and six resources:

We can see that the agent 1 is first assigned to the whole set of resources. When agent 2 join the team then we can see he automatically get a unique set of resources. You can do some tests with more resources and more agents to see what happens when an agent leave or join the team.

Thanks to the consistent hashing algorithm the resource partitions will be nearly fair between the agents.


If we sum up what has been done we can say that we leveraged ZooKeeper to establish group membership between the set of agents and then get the ability to react when an event happen. We also leveraged the consistent hashing algorithm in combination with ZooKeeper to partition the resources among the agents.

In this way the Ceilometer central agent moved from a weak architecture to a highly available and scalable one. As i said in the previous article, the real code of Ceilometer use the Tooz API but conceptually it acts in a similar manner as our little central agent (here is the real patch

I hope you enjoyed this adventure with ZooKeeper :-) ! As an exercise you can implement some real resources, use ZooKeeper to detect events (join, left or failure) so that the agents can adjust their set of assigned resources dynamically.