Ceph, TripleO and the Newton release

Posted on: Fri 26 August 2016

Time to roll up some notes on the status of Ceph in TripleO. The majority of these functionalities were available in the Mitaka release too but the examples work with code from the Newton release so they might not apply identical to Mitaka.

The TripleO default configuration

No default is going to fit everybody, but we want to know what the default is to improve from there. So let's try and see:

uc$ openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/storage-environment.yaml --ceph-storage-scale 1
Deploying templates in the directory /home/stack/example/tripleo-heat-templates
...
Overcloud Deployed

Monitors go on the controller nodes, one per node, the above command is deploying a single controller though. First interesting thing to point out is:

oc$ ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

Jewel! Kudos to Emilien for bringing support for it in puppet-ceph. Continuing our investigation, we notice the OSDs go on the cephstorage nodes and are backed by the local filesystem, as we didn't tell it to do differently:

oc$ ceph osd tree
ID WEIGHT  TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.03999 root default
-2 0.03999     host overcloud-cephstorage-0
 0 0.03999         osd.0                         up  1.00000          1.00000

Notice we got SELinux covered:

oc$ ls -laZ /srv/data
drwxr-xr-x. ceph ceph system_u:object_r:ceph_var_lib_t:s0 .
...

And use CephX with autogenerated keys:

oc$ ceph auth list
installed auth entries:

client.admin
        key: AQC2Pr9XAAAAABAAOpviw6DqOMG0syeEYmX2EQ==
        caps: [mds] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.openstack
        key: AQC2Pr9XAAAAABAAA78Svmmt+LVIcRrZRQLacw==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics

But which OpenStack service is using Ceph? The storage-environment.yaml file has some informations:

uc$ grep -v '#' tripleo-heat-templates/environments/storage-environment.yaml | uniq

 resource_registry:
   OS::TripleO::Services::CephMon: ../puppet/services/ceph-mon.yaml
   OS::TripleO::Services::CephOSD: ../puppet/services/ceph-osd.yaml
   OS::TripleO::Services::CephClient: ../puppet/services/ceph-client.yaml

 parameter_defaults:
   CinderEnableIscsiBackend: false
   CinderEnableRbdBackend: true
   CinderBackupBackend: ceph
   NovaEnableRbdBackend: true
   GlanceBackend: rbd
   GnocchiBackend: rbd

The registry lines enable the Ceph services, the parameters instead are setting Ceph as backend for Cinder, Nova, Glance and Gnocchi. They can be configured to use other backends, see the comments in the environment file. Regarding the pools:

oc$ ceph osd lspools
0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,

Despite the replica size set by default to 3, we only have a single OSD so with a single OSD the cluster will never get into HEALTH_OK:

oc$ ceph osd pool get vms size
size: 3

Good to know, now a new deployment with more interesting stuff.

A more realistic scenario

What makes it "more realistic"? We'll have enough OSDs to cover the replica size. We'll use physical disks for our OSDs (and journals) and not the local filesystem. We'll cope with a node with a different disks topology and we'll decrease the replica size for one of the pools.

Set a default disks map for the OSD nodes

Define a default configuration for the storage nodes, telling TripleO to use sdb for the OSD data and sdc for the journal:

ceph_default_disks.yaml
  parameter_defaults:
    CephStorageExtraConfig:
      ceph::profile::params::osds:
        /dev/sdb:
          journal: /dev/sdc

Customize the disks map for a specific node

For the node which has two (instead of a single) rotatory disks, we'll need a specific map. First get its system-uuid from the Ironic introspection data:

uc$ openstack baremetal introspection data save | jq .extra.system.product.uuid
"66C033FA-BAC0-4364-9E8A-3184B5952370"

then create the node specific map:

ceph_mynode_disks.yaml
  resource_registry:
    OS::TripleO::CephStorageExtraConfigPre: tripleo-heat-templates/puppet/extraconfig/pre_deploy/per_node.yaml

  parameter_defaults:
    NodeDataLookup: >
     {"66C033FA-BAC0-4364-9E8A-3184B5952370":
       {"ceph::profile::params::osds":
         {"/dev/sdb": {"journal": "/dev/sdd"},
          "/dev/sdc": {"journal": "/dev/sdd"}
         }
       }
     }

Fine tune pg_num, pgp_num and replica size for a pool

Finally, to override the replica size (and why not, PGs number) of the "vms" pool (where by default the Nova ephemeral disks go):

ceph_pools_config.yaml
  parameter_defaults:
    CephPools:
      vms:
        size: 2
        pg_num: 128
        pgp_num: 128

Zap all disks for the new deployment

We also want to clear and prepare all the non-root disks with a GPT label, which will allow us, for example, to repeat the deployment multiple times reusing the same nodes. The implementation of the disks cleanup script can vary, but we can use a sample script and wire it to the overcloud nodes via NodeUserData:

uc$ curl -O https://gist.githubusercontent.com/gfidente/42d3cdfe0c67f7c95f0c/raw/1f467c6018ada194b54f22113522db61ef944e20/ceph_wipe_disk.yaml

ceph_wipe_env.yaml:
  resource_registry:
    OS::TripleO::NodeUserData: ceph_wipe_disk.yaml

  parameter_defaults:
    ceph_disks: "/dev/sdb /dev/sdc /dev/sdd"

All the above environment files could have been merged in a single one but we split them out in multiple ones for clarity. Now the new deploy command:

uc$ openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/storage-environment.yaml --ceph-storage-scale 3 -e ceph_pools_config.yaml -e ceph_mynode_disks.yaml -e ceph_default_disks.yaml -e ceph_wipe_env.yaml
Deploying templates in the directory /home/stack/example/tripleo-heat-templates
...
Overcloud Deployed

Here is our OSDs tree, with two instances running on the node with two rotatory disks (sharing the same journal disk):

oc$ ceph os tree
ID WEIGHT  TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.03119 root default
-2 0.00780     host overcloud-cephstorage-1
 0 0.00780         osd.0                         up  1.00000          1.00000
-3 0.01559     host overcloud-cephstorage-2
 1 0.00780         osd.1                         up  1.00000          1.00000
 2 0.00780         osd.2                         up  1.00000          1.00000
-4 0.00780     host overcloud-cephstorage-0
 3 0.00780         osd.3                         up  1.00000          1.00000

and the custom PG/size values for for "vms" pool:

oc$ ceph osd pool get vms size
size: 2
oc$ ceph osd pool get vms pg_num
pg_num: 128

Another simple customization could have been to set the journals size. For example:

ceph_journal_size.yaml
  parameter_defaults:
    ExtraConfig:
      ceph::profile::params::osd_journal_size: 1024

Also we did not provide any customization for the crushmap but a recent addition from Erno makes it possible to disable global/osd_crush_update_on_start so that any customization becomes possible after the deployment is finished.

Also we did not deploy the RadosGW service as it is still a work in progress, expected for the Newton release. Submissions for its inclusion are on review.

We're also working on automating the upgrade from the Ceph/Hammer release deployed with TripleO/Mitaka to Ceph/Jewel, installed with TripleO/Newton. The process will be integrated with the OpenStack upgrade and again the submissions are on review in a series.

For more scenarios

The mechanism recently introduced in TripleO to make composable roles, discussed in a Steven's blog post, makes it possible to test a complete Ceph deployment using a single controller node too (hosting the OSD service as well), just by adding OS::TripleO::Services::CephOSD to the list of services deployed on the controller role.

And if the above still wasn't enough, TripleO continues to support configuration of OpenStack with a pre-existing, unmanaged Ceph cluster. To do so we'll want to customize the parameters in puppet-ceph-external.yaml and deploy passing that as argument instead. For example:

puppet-ceph-external.yaml
  resource_registry:
    OS::TripleO::Services::CephExternal: tripleo-heat-templates/puppet/services/ceph-external.yaml

  parameter_defaults:
    # NOTE: These example parameters are required when using Ceph External and must be obtained from the running cluster
    #CephClusterFSID: '4b5c8c0a-ff60-454b-a1b4-9747aa737d19'
    #CephClientKey: 'AQDLOh1VgEp6FRAAFzT7Zw+Y9V6JJExQAsRnRQ=='
    #CephExternalMonHost: '172.16.1.7, 172.16.1.8'

    # the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova
    NovaEnableRbdBackend: true
    CinderEnableRbdBackend: true
    CinderBackupBackend: ceph
    GlanceBackend: rbd
    GnocchiBackend: rbd
    # If the Ceph pools which host VMs, Volumes and Images do not match these
    # names OR the client keyring to use is not named 'openstack',  edit the
    # following as needed.
    NovaRbdPoolName: vms
    CinderRbdPoolName: volumes
    GlanceRbdPoolName: images
    GnocchiRbdPoolName: metrics
    CephClientUserName: openstack
    # finally we disable the Cinder LVM backend
    CinderEnableIscsiBackend: false

Come help in #tripleo @ freenode and don't forget to check the docs at tripleo.org! Some related topics are described there, for example, how to set the root device via Ironic for the nodes with multiple disks or how to push in ceph.conf additional arbitraty settings.

Category: techie – Tags: openstack, ceph, tripleo