Excel for Statistical Analysis
Custom Workshop, February 2009
Presenter: Anu Vedantham
Questions? Email wic1@pobox.upenn.edu
Library ITA for basic Excel assistance: Mon. through Thurs, 3 to 6 pm, walk-in.
Office 2007 Basics
- Ribbon - Overview
- Customizing the menu bar
- Keyboard shortcuts (Press Alt to see them)
- Verifying Data Analysis Add-ins - Click on Data tab - do you see the data analysis group? If not, do the following: Microsoft Office Button / Excel Options / Add-Ins / Manage Excel Add-Ins / select Analysis TookPak and add.
Excel and Data Analysis: Workshop and Online Tutorial
Article encouraging caution when using Excel for statistics
Download sample data file from Math Forum website
Possible topics: Displaying histograms, probabilities and data distributions: data plotting, functions, standard deviation, coefficient of variation, chi-squared test, t-test.
Basics of moving around the data:
- Ctrl-> go to right end of data
- Ctrl<- go to left end of data
- Ctrl-End go to last filled cell on the sheet
- Ctrl-Home go to Cell A1
- Selecting a row or column – click on its name
- Shift-click selects adjacent cells
- Ctrl-click selects random cells - can format but cannot copy these
- Copy and paste works with most types of selections – sometimes gives errors with existing cell configurations
- Paste Special options - helpful for taking data from the web or other programs
Notice that there are blank columns that need to be connected and a blank row is needed before "All Students".
Sorting, ranges and data generation
- Sorting
- Sorting looks for continuous ranges - no empty rows or columns
- You can expand the range by filling in blank columns - any content will work
- Ranges
- Name a range: Select the data, then click on the Name Box (upper left of screen) and create a name. Let's name Math 2004, Math 1990, Part 2004 and Part 1990
- Move to the range: Click on the Name Box and pick out its name. Once named you can use named range in any function easily
- Series Generation and ID numbers
- Add an ID column to the left of the data.
- Use AutoFill to number the rows. This is very helpful when importing data from databases or websites.
- Look at Autofill options for generating simple equations
- This also allows trend generation and linear extrapolation
Descriptive Statistics: Data / Data Analysis / Descriptive Statistics - Notice options on output.
Histogram: Data / Data Analysis / Histogram - Make histograms for the participation rates in 2004 and 1990.
Try the automated bin creation. Then specify bins of 10% each from 0 to 100. What can you conclude about the distribution?
T-test: try a Paired T-test for Math 1990 and 2004.
Sampling options: random, periodic
Functions:
- Absolute and relative references
- Copying and adapting functions
- Displaying functions
Plotting:
- Working with graphs and charts - formatting points
- Highlighting data points to support discussion - scatter plot example
Filters - Under Data Tab
- Try filtering by one field - find misspellings, multiple versions
- Filtering by multiple fields - basic queries
- Number filters - Top 10… options to look at high and low values, percents
Conditional Formatting
- Open third sample spreadsheet - adapted from an Amherst College tutorial.
- Select all the data and name it
- Home / Conditional Formatting -
- Simple - try setting positive numbers to green
- Next steps - wonderful new features in Office 2007. Try the automated color scale.
- Now try a double format – red for more than 20 degrees and blue for less than -20 degrees. Browse the sheet now and the data is easy to understand
- You can get sophisticated with formulas as well.
Chi-squared function: CHIINV