In this blog post we release some benchmarking results about OpenStack Swift. Cloudwatt, a company we work with, lent us some servers in order to perform a benchmark on a Swift installation. We’ll describe the hardware, the tool and the methodology we used to do that.
Swiftstack develops a great tool called ssbench designed to benchmark OpenStack Swift, you could find this tool on github. Ssbench is based on scenario files that let the user describe which kind of operations ssbench’s workers will perform against the Swift cluster. The architecture of ssbench is really handy as it is composed of a master process and one or many worker process. Workers are connected to master by a message queue bus so you can spread workers across many hosts allowing you to evaluate wide sized Swift cluster.
The cluster architecture we used is composed of 7 servers DELL R720xd (2 x Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz, thus 24 threads) and 32GB of RAM. Those servers are connected to a 10GB switch by 10GB NICs. Each host owns 25 internal hard drives and 12 external (DELL MD1200) and each drive is 1TB.
Linux distribution on each host is Debian Wheezy and Openstack Swift version is 1.7.5. Swift configuration is almost default using tempauth and one memcache server.
We did some tests to reach the best performance with those hosts and concluded that using two of them for swift proxy process is better. HAProxy is installed on one of them to spread operations against both proxy. Five hosts are then used for account, container and object storage.
The Swift ring has been configured with 3 replicas and the following for devices:
- 24 devices for storing object on each server
- 9 devices for storing container on each server
- 3 devices for storing account on each server
And each storage server has been sized this way:
- 17 workers for object server
- 8 workers for container server
- 3 workers for account server
Swift proxies are configured to run 24 proxy workers and HAProxy policy is a simple round robin.
As mention above ssbench need a scenario file that describes operations that will be performed on Swift. We have written various simple scenario to benchmark specific object size and for each object size the main C.R.U.D operations (Create, Read, Update, Delete). Below this is one of our scenario targeted to only perform create operation on 24KB objects:
"name": "Pure create scenario",
"name": "Small files (24K)",
"Small files (24K)": 100
"crud_profile": [100, 0, 0, 0],
One of the main problems was to correctly adjust the user concurrency (user_count). A value of 20 will simulate 20 clients fairly spread over ssbench’s worker process but how to know whether our swift cluster can handle more. To find this concurrency limit we have created of small bash wrapper that manage multiple ssbench runs and change concurrency value by increasing it. Once the operations/sec count during two run remains the same we stop and keep the last concurrency value as reference. A specific scenario where C.R.U.D profile is 25/25/25/25 is used by the wrapper. Then we start our specific CREATE, READ, UPDATE, DELETE scenario at this concurrency on 30000 operations.
The chart below shows the performance reached in operations by seconds for different object size 24KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, 2800KB. The later object size correspond to the average object size stored in Cloudwatt’s object store (take a look at this project swift-account-stats). For each object size we have 4 bars one for each kind of operation. Note that Y axis is on a logarithmic scale.
Performance remains similar up to 64KB object and then begins to decrease. The cluster can handle 7000 read ops/second and 3800 create/update ops/second for object up to 64KB.
This chart shows the same results as above but displays the bandwidth in MB/second revealing that we quickly reached the bandwidth limit of the network for 512KB object and wider objects. The network architecture used for this benchmark was undersized and at least we should have used 2 NICs on proxy hosts one for client inbound/outbound data (ssbench worker) and one for access the storage network.
During high load three components are heavily solicited thus CPU, disk IO and network bandwidth. Our benchmark tests shown us that proxy process are really CPU intensive so you need to carefully size your proxy hosts when designing your cluster. For storage node disk performance is crucial and having some disk that perform badly can significantly decrease the overall performance of the cluster. Be sure to evaluate hard disks before integrating them into your cluster (fio is a good tool to benchmark storage devices).