The links and resources below are designed to help health researchers find information that will be useful to them for performing statistical analysis. Since we're releasing these things into the wild, we've endeavoured to only include links or resources that we have had the time to check and validate. Our intention is to provide links to existing sources where possible, rather than replicating information that has been well developed elsewhere. We hope to keep this page updated regularly as we find new resources that we think will be of use to the local research community.
Our colleagues at the University of Otago, Christchurch, have a useful webpage on some guidelines for design and analysis of research projects.
Basic biostatistics video series
We have a series of short videos on basic biostatistics (freely available on the Vimeo website) – if you find them useful, please feel free to refer others to them.
The videos are cut into relatively short topic sections (they range from about 5 to 15 minutes each) , and cover a number of basic biostatistical content areas (e.g. sampling variability, the normal distribution, confidence intervals for means, basics of hypothesis testing.)
Information documents
Choosing a statistical package
This pdf report covers some of the points you might want to consider when choosing a statistical data analysis package, and covers a number of potential options for people at all levels of experience. I prepared this document keeping in mind the variety of researchers we deal with in our consulting roles here at UOW.
The statistical group at UCLA have a pdf technical report discussing the differences between Stata, SAS, and SPSS, although this is from a slightly different perspective (that of a consultant/analyst rather than a project researcher).
Custom tools for data analysis
I have prepared this Excel spreadsheet for calculating confidence intervals on proportions and rates. It has been cross-validated against both Stata and OpenEpi.com for a variety of values. The main advantage of using this file rather than e.g. OpenEpi is that you can easily calculate confidence intervals for multiple proportions by copying and pasting formulae. This can be considerably quicker than entering ten or twenty pairs of numerator/denominator into OpenEpi.
The citation for the formula implemented is given inside the Excel file: the formula uses Fisher's exact methods to calculate the confidence interval for binomial proportions, and confidence intervals for rates, based on the Delta method [may be updated at some stage for exact confidence intervals for rates.]
Biostats web resources: Help on stats packages
The following pages/resources are very useful summaries of how to do different types of analysis, or deal with particular statistical packages.
ATS UCLA Webpages on Statistical Computing
For specific help in using major statistical analysis programmes, the UCLA Academic Technology Services website has a large number of worked examples, showing different statistical techniques in different research domains. The quality of the commentary is very high, and material is presented in a (mostly) non-technical manner (i.e. no page long lists of formulae!)
More importantly, most examples are worked up for a variety of statistical packages (including SAS, SPSS, Stata, and R) which means you can use some of this material as a statistical Rosetta stone if you're learning a new package.
Biostats web resources: Information
Otago ebooks
As well as physical books in the general library, Otago has a number of ebooks that are available for staff/students and others with access to the Otago library network. These are available through the Otago library catalogue (Wellington library webpage link here) so do check if you want reference to a particular book. Some publishers allow downloading of PDF files for offline reading e.g. on an iPad or other tablet, while others use a webpage interface to display the texts, which I personally find harder to use for longer reading sessions.
We might add some links on recommended textbooks (over time.)
Statistics notes in the BMJ
A very informative series of articles in the BMJ about various statistical issues, ranging from the basic (means, standard deviations vs. standard errors) to the more esoteric. You can view a list of all of the topics on the BMJ's own Statistics Notes index web page, which brings up a few results at a time. An alternative list is available through Martin Bland's web page (Martin Bland is, along with Doug Altman, one of the two major contributors to the Statistics Notes.) This index was still being maintained when I last checked in November 2011.
Endgames: Statistical questions in the BMJ
A series of mini-quizzes on statistical questions, again from the BMJ. An index of the topics is available on the BMJ's website. These questions generally have a biomedical context, but ask about common and useful statistical concepts that will apply across most research fields.
A wider list of non-statistics related Endgames questions is also available on the BMJ website, and can be viewed according to content topic (e.g. paediatrics, psychiatry) and category (e.g. anatomical test, case report.)
Biostats web resources: blogs and mailing lists
The following list gives some blogs, mailing lists, and Q&A websites that may be of use or interest to people.
Statschat is a blog by Thomas Lumley, a professor of biostatistics at the University of Auckland. It talks about how statistics, and results from academic studies, are interpreted and reported in the New Zealand and international media (which is to say, that the focus is often on how statistics are misinterpreted and misreported...) As well as amusing material, there are some good critical appraisal lessons to be gleaned from Thomas's blog.
Cross Validated is a Q&A site that allows for a non-traditional approach to the kinds of questions that are often asked on mailing lists. There are some good bits of information on the website, but as per any internet or mailing list resource you should treat the advice you might see or seek here as a starting point for more in-depth investigation, rather than an authoritative answer on the perfect method for dealing with your problem. The emphasis is on statistical methods, rather than programming -- technical implication questions are usually covered on CV's big sister site, StackOverflow.
Medstats is a mailing list about... statistics in medicine (surprise!) It is set up as a Google group, and operates as a mailing list to which you can subscribe. You can also freely browse the archives at the link to the left. The content is restricted to theoretical/practical questions about statistics in medicine -- questions about programming issues are discouraged (and would be better asked elsewhere), and posts about conferences or job advertisements and the like are not allowed.
ANZstat is an Australian and New Zealand statistics mailing list, and covers all branches of statistics (not just biostatistics): it tends to have a lot of posts about conferences and jobs, with a smaller trickle of comment on statistical methods or issues about statistics that arise in the Australasian media. Information about joining (and leaving) the mailing list are available through the link at the start of this paragraph.
Biostats web resources: Freeware programmes
The sections below give links as to where to access freeware or open source programmes for statistical analysis. One or more the biostatisticians at UOW have experience at using these packages and can provide you with support on using them. For information on all of the statistical software we use and can support, please see [link to come -- list includes SAS, SPSS, Stata, R, Epi Info]
R
R is a pretty serious piece of freeware statistical analysis software, with a very large user community who have produced a huge number of packages to extend functionality. It can be downloaded through the R project website, and works under Windows, Mac, and Linux operating systems.
It's a great piece of software to use IF you know how to programme, or are prepared to learn how to do so -- otherwise this may be too far off the deep end. R was originally programmed by Ross Ihaka in NZ, just in case you're feeling patriotic/parochial.
James and Tak both use the RStudio environment (also freeware, available through the RStudio website), which is an integrated development environment (IDE) front end for R which can make your programming life a little smoother.
Epi Info
Epi Info was originally developed by the Center for Disease Control in Atlanta, Georgia, and is freeware -- you can download it from the CDC website. There are two currently available versions of the programme: version 3.5.3 has been around in a stable form for several years, and while it looks a little outdated it does what it claims to do very neatly.
As of October 2011, the CDC has released Epi Info v7, which can also be downloaded at the CDC website. The newer version certainly looks a lot nicer, and also has the advantage that it can be installed on a computer without administrator privileges. However, there are currently a few missing items compared to the old Epi Info 3.5.3, including no histogram plotting function, and no Cox proportional hazards model function.
Nostalgics can access some of the DOS based interfaces from the programme's infancy through either version of the package (e.g. sample size calculations) but these are a bit tricky to run on modern operating systems (especially on 64-bit CPUs), and much of the functionality is now available in a more user-friendly manner on Open Epi (see immediately below).
Open Epi
Open Epi is an easily accessible webpage for statistical calculations. It is an open-source effort to rewrite the "instant calculator" aspects of Epi Info into a Java based webpage format. It is very good at what it does (confidence intervals or hypothesis tests for categorical data and simple sample size calculations are very good) but you should note in advance that it doesn't have any data input/output functionality (it doesn't allow input/output functions for security reasons.) If you need to do complex analyses on a complex dataset, you'll need to look elsewhere.