wiclogoExcel for Statistical Analysis
Custom Workshop, February 2009
Presenter: Anu Vedantham
Questions? Email wic1@pobox.upenn.edu

Library ITA for basic Excel assistance: Mon. through Thurs, 3 to 6 pm, walk-in.

Office 2007 Basics

  1. Ribbon - Overview
  2. Customizing the menu bar
  3. Keyboard shortcuts (Press Alt to see them)
  4. Verifying Data Analysis Add-ins - Click on Data tab - do you see the data analysis group? If not, do the following: Microsoft Office Button / Excel Options / Add-Ins / Manage Excel Add-Ins / select Analysis TookPak and add.

Excel and Data Analysis: Workshop and Online Tutorial

Article encouraging caution when using Excel for statistics

Download sample data file from Math Forum website

Possible topics: Displaying histograms, probabilities and data distributions: data plotting, functions, standard deviation, coefficient of variation, chi-squared test, t-test.

Basics of moving around the data:

    1. Ctrl-> go to right end of data
    2. Ctrl<- go to left end of data
    3. Ctrl-End go to last filled cell on the sheet
    4. Ctrl-Home go to Cell A1
    5. Selecting a row or column – click on its name
    6. Shift-click selects adjacent cells
    7. Ctrl-click selects random cells - can format but cannot copy these
    8. Copy and paste works with most types of selections – sometimes gives errors with existing cell configurations
    9. Paste Special options - helpful for taking data from the web or other programs

Notice that there are blank columns that need to be connected and a blank row is needed before "All Students".

Sorting, ranges and data generation

    1. Sorting
      1. Sorting looks for continuous ranges - no empty rows or columns
      2. You can expand the range by filling in blank columns - any content will work
    2. Ranges
      1. Name a range: Select the data, then click on the Name Box (upper left of screen) and create a name. Let's name Math 2004, Math 1990, Part 2004 and Part 1990
      2. Move to the range: Click on the Name Box and pick out its name. Once named you can use named range in any function easily
    3. Series Generation and ID numbers
      1. Add an ID column to the left of the data.
      2. Use AutoFill to number the rows. This is very helpful when importing data from databases or websites.
      3. Look at Autofill options for generating simple equations
      4. This also allows trend generation and linear extrapolation

Descriptive Statistics: Data / Data Analysis / Descriptive Statistics - Notice options on output.

Histogram: Data / Data Analysis / Histogram - Make histograms for the participation rates in 2004 and 1990.

Try the automated bin creation. Then specify bins of 10% each from 0 to 100. What can you conclude about the distribution?

T-test: try a Paired T-test for Math 1990 and 2004.

Sampling options: random, periodic

Functions:

  1. Absolute and relative references
  2. Copying and adapting functions
  3. Displaying functions

Plotting:

  1. Working with graphs and charts - formatting points
  2. Highlighting data points to support discussion - scatter plot example

Filters - Under Data Tab

    1. Try filtering by one field - find misspellings, multiple versions
    2. Filtering by multiple fields - basic queries
    3. Number filters - Top 10… options to look at high and low values, percents

Conditional Formatting

    1. Open third sample spreadsheet - adapted from an Amherst College tutorial.
    2. Select all the data and name it
    3. Home / Conditional Formatting -
    4. Simple - try setting positive numbers to green
    5. Next steps - wonderful new features in Office 2007. Try the automated color scale.
    6. Now try a double format – red for more than 20 degrees and blue for less than -20 degrees. Browse the sheet now and the data is easy to understand
    7. You can get sophisticated with formulas as well.

Chi-squared function: CHIINV