Tue 01 September 2020

Orchestrating applications by (ab)using Ansible's Network XML Parser

As part of our internal security work at Codethink, we've been working on repeatedly deploying and configuring openvas to constantly scan and report on our internal system.

There are various roles out there on ansible-galaxy that will take care of installing this application for you, but their main advantage is the ability to install on platforms we don't use, and the ones we looked over didn't provide any configuration of openvas itself.

Initial installation was easy enough, but when we came to do the configuration we hit an issue - openvas is very... XML. There aren't any existing modules to control openvas' configuration from ansible, so we need to implement this ourself.

Typically when we do this, we want our playbooks to remain as idempotent as possible. This means if we're running manual commands, we should be checking the current state, and modifying the state to meet our desired state only when necessary, as this means we're not having to change things and restart services unnecessarily. To do this, we need the ability to look at the output of the command module, check what entries are currently there, and add / remove / edit any entries that need modification.

Our initial assumption was that we could just throw the xml for reading and asserting state through a from_xml filter, and receive a nice object to manipulate at ease, similar to various json-talking applications. Unfortunately this doesn't exist, because XML doesn't really work like the json / yaml / ini files we're used to parsing.


Ansible does have an xml parsing module though! It was written for, and is documented as a part of the network orchestration subsystem in the documentation:

To convert the XML output of a network device command into structured JSON output, use the parse_xml filter:

We aren't controlling a networking device, but we do need to parse XML. We can probably use this to make our playbook idempotent! Let's have a go at configuring schedules:

Grabbing some XML

- name: Get all openvas schedules
  command: "/usr/bin/omp --username=adminuser --password=adminuser -p 9390 -X '<get_schedules/>'"
  check_mode: no                  # run in check mode
  changed_when: no                # never show as having updated anything
  register: openvas_existing_schedules_raw

This returns us a chunk of XML showing all the schedules currently configured within openvas. I'm not going to paste it here, because it's giant, but we can pull up the specification for the XML in the omp protocol documentation.

We're going to be taking the xml returned by this and turning it into a dict of objects that we can use within ansible to check what we need to do to assert state. First, we need to identify which bits of this are useful to us. As we plan on only having specific schedules set up, We're going to make a couple of assumptions here, namely that if a schedule exists with the correct name, it is the correct schedule. So we need the name. If we're going to modify a schedule, we need its id. We can't remove a schedule that's currently being used, so in_use is useful for us, and the friendly name in comment might be useful for displaying to the user while running the playbook.

parse_xml configuration

Once we've identified the above, we can get on with using the parse_xml module to... parse the XML. We need to set up a new YAML file containing two entries, keys:, which is the map of how to extract the data from the xml and vars:, which is the object we'll be creating. These are a bit interleaved, as they reference each other, so keeping them both in the same file appears to be good practice.

For the schedules, we have a file in our role, xml_spec/schedule containing the following:

    value: "{{ schedule }}"
    top: "schedule"
      name: name
      comment: comment
      id: ".[@id]"
      in_use: in_use
    key: "{{ item.name }}"
      name: "{{ item.name }}"
      comment: "{{ item.comment }}"
      in_use: "{{ item.in_use == 1 }}"
      id: "{{ item.id.get('id') }}"

keys: is being used here to define how to extract data from the XML:

  • value is the object described in vars: we'll be unpacking the XML into, in this case schedule - this is applied when doing the parsing as a jinja filter
  • top is an xpath expression pointing to the element in the xml which contains the elements we want to turn into our dict. In this case, schedule searches under the root node for the element <schedule>...</schedule>
  • items is a dict of items within those elements we want to extract. These can be specified using xpath expressions as above.
  • name: name, comment: comment and in_use: in_use all refer directly to the tags of elements within our schedule element
  • id: ".[@id]" is an xpath expression that grabs the id attribute of the top level element, in our case the schedule element

vars: is being used to describe the objects we'll be unpacking the XML into, in this case schedule. They unpack into a top-level dict, containing our specified object as a sub-dict.

  • key is the key that should be used in the top level dict for each object. In this case, we're setting it to item.name, which refers to the name field of the schedule we set up in keys:

  • values are the values we're loading into each dict entry. We're using jinja expressions to extract them, so for simple text elements like name and comment we can grab them directly from item, and for in_use we're using a comparison to have a boolean available in our dict. id is a bit more complicated as we have to extract the actual id from the xpath return object.

Actually using parse_xml

- name: Parse XML to extract schedules
    openvas_existing_schedules: "{{ ( openvas_existing_schedules_raw.stdout | parse_xml ('roles/openvas/xml_spec/schedule') )['schedules'] }}"

This takes the stdout from the command we ran previously, filters it using our schedule specification, and extracts the schedules key.

We only want two schedules for our system, a daily run at 2am, and a weekly run at 6am on sunday. We can now use the above variable to create our new schedules if they don't already exist, and remove any existing schedules that shouldn't exist:

- name: Create daily schedule if it doesn't already exist
  command: "{{ omp_command }} -X '<create_schedule><name>daily</name><comment>Daily @ 2am</comment><first_time><day_of_month>7</day_of_month><hour>2</hour><minute>0</minute><month>6</month><year>2020</year></first_time><duration>6<unit>hour</unit></duration><period>1<unit>day</unit></period></create_schedule>'"
    - "'daily' not in openvas_existing_schedules"

- name: Create weekly schedule if it doesn't already exist
  command: "{{ omp_command }} -X '<create_schedule><name>weekly</name><comment>Weekly @ 6am Sunday</comment><first_time><day_of_month>7</day_of_month><hour>6</hour><minute>0</minute><month>6</month><year>2020</year></first_time><duration>12<unit>hour</unit></duration><period>7<unit>day</unit></period></create_schedule>'"
    - "'weekly' not in openvas_existing_schedules"

- name: Remove any schedules not listed above
  command: "{{ omp_command }} -X '<delete_schedule schedule_id=\"{{ item.value.id }}\"/>'"
    - item.key not in ['daily', 'weekly']
  loop: "{{ openvas_existing_schedules | dict2items }}"


As you can see above, the parse_xml filter is a significantly more useful addition to the ansible toolbox than the documentation would have you believe. Given our occasional need to deploy xml-speaking applications using modern devops tooling, i'm quite glad to have it available!

Photo by Richard Clyborne of Music Strive

Other Content

Get in touch to find out how Codethink can help you

sales@codethink.co.uk +44 161 660 9930

Contact us