Data Presentation Example: Comparing Distributions

Histograms are a standard way of looking at a data distribution. Often, one needs to compare data distributions (E.g., in HEP one often compares distributions obtained from simulated signal to those obtained from simulated background). The standard way to do this is to directly overlay the two histograms onto one plot

  • simple histogram overlay:

This image was generated using ROOT, which is the standard analysis tool used by HEP. This image is actually better than the standard output generated by ROOT since I
  • thickened the lines used to draw the data to better emphasize the data
  • changed the color of the axes in order to de-emphasize them relative to the data
However, that is about as much as I could do without putting in a great deal more effort (which was done for the following plots)

But just looking at overlays can make it very difficult to see the actual differences between the distributions. Two possible ways of showing differences in distributions is to plot the actual difference (i.e. histogram_1 - histogram_2) or the ratio

  • overlay with difference:

  • overlay with ratio:

In both figures, the upper plot is the overlayed distribution while the one below it gives a different way of seeing the differences. I have attempted several improvements

  • The top histograms are now filled in with a 'softer' version of the color used to draw that histogram (this avoids 'tiring' the eyes when an image has large areas of highly saturated color). The fill color is meant to better emphasize which distribution is larger for a particular bin. The outline of the histogram is drawn on top of the fill color so that both distributions can be seen at all times. (This required drawing 4 histograms, 2 that were filled and 2 that were just outlines)
  • The bottom histograms use the same color coding as the top ones to reinforce the link between them
  • The top histograms are the 'primary' data so their background color has a higher contrast with the forground color in order to draw your attention

The file ratioPlot.C contains a ROOT macro (i.e. workflow) which can almost recreate the ratio plot (I did not have the patience to figure out all the commands I would have to give to do an exact recreation and although ROOT will generate a macro based on a histogram you have 'edited' it embeds the histogram data along with the commands to setup the presentation).

Some potential further improvements
  • Label the bottom box to say what is being shown
  • Try using partial-transparency for the fill color to allow the 'distribution behind' to be seen through the other one (rather than doing a two-phase rendering)
  • Have the top and bottom plot share the same X axis (I could not figure out how to do that with the tool I was using)
  • Add error bars to the lower distributions to show the significance of the difference
  • Try a comparision plot which is a ratio of the error on the difference to the difference. Such a plot should show the significance of the differences between the distributions
  • The Y axis of the ratio plot should be done logarithmically so that a/b and b/a appear to have the same displacement

-- ChrisDJones - 24 Aug 2006
Topic revision: r4 - 30 Aug 2006, ChrisDJones
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback