Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So forgive the statistical ignorance, but how would one display mean and std dev in useful graphical form? (Asking seriously as it would be good to know which setting to press on gnu plot)


Not quite the same, but box-and-whisker plots (https://en.wikipedia.org/wiki/Box_plot) are good for visualising distributions. They show the mean, minimum, maximum and the first and second quartiles.

Since they show the minimum and maximum we can judge the overall spread, the mean tells us...well...the mean and first and second quartile lets us judge how close data is clustered around the mean.

See http://gnuplot.sourceforge.net/demo/candlesticks.html for gnuplot examples.


A minor point, but I think Box Plots usually show the median rather than the mean at the centre.

It would make sense perhaps to use standard deviations and means for some data, but in cases like this I think quartiles and medians make more sense.


But box plots seem to be needing turning on their side to visualise (assuming normal distribution)

I suppose I am looking for mathematically faithful whilst still Omnigraffle good looks

Cheers


Personally I prefer histograms (assuming you have all the data, not just mean and std dev).

For example, here's a histogram of ages of passengers of the Titanic: https://www.statwing.com/demos/titanic#workspaces/21

Or rents in the bay area: https://www.statwing.com/open/datasets/034b4af16d50a0ad35f1a...

Gives a real nice feel for the data.

These histograms are lacking in that they don't actually have the mean line or std devs on the plot itself; my general point is that if the goal is to get across a distribution, a histogram is the best way to do it, perhaps aided by markers for mean, median, std dev, etc., as the case merits.

Disclosure: I work at Statwing.


With boxplots? http://benchmarksgame.alioth.debian.org/u64q/which-programs-... (they can show a bit more though, min, max, interquartile range or mean+/- 1 stddev and mean itself) see also http://en.wikipedia.org/wiki/Box_plot .


even just some error bars would help: http://en.wikipedia.org/wiki/Error_bar

They're typically used for confidence intervals or standard error, but you can theoretically use them with any measure of variability as long as you're clear.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: