Adding new 3rd party tools in Fio

As per my daily work at eNovance, I do like benchmarking but prior to any good one, it’s important to check the sanity of the system.

This article will be about understanding how your local storage device behave and the associated tools.

Understanding your environment

Today’s Server density is pretty high and could make setups like 24 internal disks like on a 2U Dell R720xd. You can even add more disks device by attaching external storage to it. I’ve met such configuration with a partner (CloudWatt) with a total of 36 disks attached to a single physical server (24 x 2.5″ internal disks + 12 x 3.5″ external disks).

But how to run a benchmark on such big beast while not understanding :

  • how every single disks runs ?
  • how does the controller is playing while doing IOs on all the disks at the same time ?
  • is there any disks not performing as expected ?

To answer all those questions, I’m using fio.

Using Fio to benchmark storage devices

fio written by Jens Axboe is clearly my favourite storage benchmark tool.

It got really made for being versatile by featuring :

  • many IO engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more)
  • complex IO pattern (controlling read/write and sequential/random distribution)
  • writing job control files
  • both file or block device storage targets support
  • threaded or forked multiple jobs
  • {latency|iops|bandwidth} performance traces to be used later for complementary analysis.

fio runs on many operating systems and is of course a free software under GPL.

Introducing genfio

Even if fio is a very versatile tool, it’s pretty boring to setup a configuration file for testing 36 disks. At least, no benchmark tool I know give helpers to ease the configuration of such large test.

genfio is a 3rd party tool included in fio since release 2.1.2. A very simple command line generates big jobs.

In this 36 disks case, genfio allow you to create a single job file that will test :

  • one disk at a time
  • all disks at the same time
  • sequential read, sequential write, random read, random write patterns
  • define a list of block size you want to test
  • a time based runtime for each test
  • separate log file for each test for IOPS and Bandwidth
  • with or without caching

As an example, you can have the following command line :

As shown in this example, the generated fio file will be made of 48 individual jobs of 300 seconds to complete the benchmark which last 4 hours. A single execution of this fio job will runs all the tests without any human input.

Pros:

  • no need of boring copy/paste to get numerous job
  • no possible human typo/mistake
  • no need of human inputs to run very long jobs
  • automated job naming
  • automated log file naming
  • automated execution of numerous jobs

Cons:

  • not all fio features are available yet

Plotting the result

After running a complex fio job, a great amount of data are generated. Understanding how the benchmark run is pretty complex as :

  • numerical values (average, min, max, standard deviation) doesn’t give a complete overview

    • If the storage device stops providing IOs for a short, the metrics doesn’t reports it in a obvious way)
    • graphical plotting huge data requires a great time to setup the plotting properly
  • graphical plotting doesn’t always provides the good view

    • plotting 36 traces on the same graph isn’t easy to read
    • estimating the average bandwidth of a graph isn’t easy neither

To help a plotting massive fio outputs, fio2gnuplot tool got developed and contributed to fio and included since release 2.1.2

fio2gnplot selects fio’s output file by using regular expressions, parse them and generate a series of graphics by using gnuplot.

To understand how every since part of the bench run, a series of graphics will be generate for each trace file :

  • a raw graph to show the real values reported during the test
  • a smooth graph using CSplines plotting to ease the performance reading
  • a Bezier’s based graph plotting to see the trends (performance peaks are typically hidden)
  • a green line is printed on each kind of graph to report the average value during the benchmark
  • an automatic naming of axis and title is done accordingly to the type of benchmark (IO vs Bandwidth, Block size reporting)

To understand how the global bench was running, a series of graphics will be generated to compare the various trace file :

  • a raw graph to show the real values reported during the test
  • a smooth graph using CSplines plotting to ease the performance reading
  • a Bezier’s based graph plotting to see the trends (performance peaks are typically hidden)
  • a green line is printed on each kind of graph to report the average value during the benchmark
  • an automatic naming of axis and title is done accordingly to the type of benchmark (IO vs Bandwidth, Block size reporting)
  • a 3D graph reporting a surface
    • the flatter the surface is, the closer performances are
  • a graph bar to compare the following computation based on each trace
    • Average value
    • Min value
    • Max value
    • Standard Deviation

Bar graphs are very useful to make a quick comparison between many traces while not having the readability issues of plotting many lines on a single graph.

Sample plotting are available at the end of this article.

On one hand, fio2gnuplot generates many traces file and many could be useless regarding the kind of information you are interested on. On the other hand, fio2gnuplot runs the same series of graphs automatically in a very easy way letting users being concentrated at reading the most interesting ones.

Using fio2gnuplot, graphing fio’s output is become easier and quicker. Every fio user benefits of graphical rendering while not having the need of understanding how to plot them using gnuplot.

3D Plotting Plotting 10 traces on a single graph Bar graphing 36 disks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

  1. Pingback: OpenStack Community Weekly Newsletter (Sep 6-13) » The OpenStack Blog

  2. Pingback: OpenStack社区周报(9.12 – 9.19) | UnitedStack Inc.

  3. Hi Erwan,

    I have started using fio for some of my IO benchmarking projects. So far I am using the inbuilt fio_generate_plots tool to plot basic bandwidth and latency graphs. However, the graphs I get seem pretty basic compared to the examples you have posted.

    How can I install fio2gnuplot? I did some some searching and did not find any resources.

  4. Thanks for the explanation Erwan. I have started using fio2gnuplot and for single thread (numjobs=1) fio configs, the results are accurate.

    Have you tried running fio2gnuplot for multiple threads i.e numjobs=4? I am getting garbled/unclear plots. The commands I am using is:

    fio2gnuplot.py -g -i
    fio2gnuplot.py -g -b

  5. Hi Erwan,
    regarding the results it is very sad you did not make some deeper analysis.
    If you can explain me what motivate you to use data destructive cspline and bezier smoothing instead of standard filtering tools I would be happy to read you !
    May be you ran an fft on your results and results were not relevant ?

    I encourage you to make a second post focused on true analysis.
    Thanks again for sharing your data.

    Yuhan.

    ps: sharing raw data would be awesome

    • Let me clarify this blog post. It was made to present a tool and how it works with a couple of samples output to help people understand its features.

      This blog post is not about analyzing a particular benchmark, it’s really about presenting a tool.

      Regarding your concern about the “true analysis”, you surely missed that the tool is able to provide a _raw_ graphing _or_ a cspline _or_ a bezier rendering.

      It’s really up to the user to decide which output he wants regarding its needs.

      Really, this tool is about turning a text output of the wonderful fio tool into something easier to read through a graphical output while having several outputs.

      I hope this clarify the blog post and answer your concerns.

      Erwan,

  6. Pingback: FIO: Bench IO disks | Deimosfr Blog