University of FloridaSolutions for Your Life

Download PDF
Publication #PEOD8

Analyzing Survey Data1

Glenn D. Israel2

For many Extension professionals, the task of analyzing survey data creates feelings of anxiety or frustration. One reason people react in this manner to the data analysis task is that few people have many opportunities beyond completing thesis research to develop skills in applied data analysis. This problem is compounded by the common practice of collecting information about items which are not critical for completing the study. The combination of limited skill in conducting applied data analysis and an over abundance of data can create gridlock in this process.

The purpose of this paper is to provide a strategy for conducting an analysis of survey data. Of course, the purpose of the study is an important influence on the process for analyzing survey data. One set of procedures could be used if a study is intended to identify needs or to describe the impact of an Extension program, while other steps may be necessary for conducting a more rigorous study of causal effects. With this point in mind, the focus of this paper is on describing and summarizing needs or impacts. In the first section, the nature of survey data is discussed. This is followed by a review of the logic of analyzing survey data and then by an illustration of the recommended procedures.

Before starting any analysis, the data should be reviewed to identify and correct errors in the data which may have occurred when the dataset was created. Techniques for screening a dataset are outlined in Phases of Data Analysis (Israel, 1992).

The Nature Of Survey Data

The process for analyzing survey data depends in no small part on the type of data and the number of items or questions in the survey. In general, most mail and telephone surveys use primarily closed format questions with categorical response options. A typical question using this format would ask, "How much did you learn about ___________?" and provide response options of "A lot," "Some" "A little," "Nothing at all," and "Don't Know." This type of data requires the use of statistical methods that are appropriate for categorical data.

The analysis of data for a single question on a survey is fairly simple and begins by describing how responses are distributed among the categories. With the help of a statistical software package,1 a frequency table of counts and percentages can be calculated in a few seconds time (Figure 1). The information from tables can be shown in graphical form to add color or emphasis in presentations for your audiences.

Figure 1. 

Tabular analysis, using two-way and three-way tables, can be used to describe relationships between an item and others in the survey. Obviously, the selection of the variables to include in any tabular analysis should be based on theory, research findings or careful thinking. Otherwise, the analysis runs the risks of fulfilling the old adage "Garbage in, Garbage out."

The Logic Of Data Analysis

The task of data analysis becomes more complex when the number of items or questions is large. A frequency table can be constructed for each item on the survey but this can result in upwards of 50 or 100 tables. As the volume of statistics increases, the problem becomes one of organizing the results into a coherent and meaningful set of findings.

Because of the variety in form and content of surveys, there is no one best way to analyze data. However a series of techniques are outlined below which provide several options for organizing and interpreting survey data.

Establish Priorities

Many questionnaires have questions asking about different topics but using the same response format. For example, a needs assessment survey might include items asking respondents to rate how serious a problem is (Figure 2). One way to organize this data is to rank-order the items by the percent of responses who say an item is a serious problem (or a moderate or serious problem). In this way, information about a number of items can be presented in a single table or figure (Figure 3). Note that the same shading pattern was used to highlight the group of items with a similar percent of the measured response. Based on the rankings in Figure 3, healthcare, jobs, and housing could be interpreted as the top priority needs in the community being studied. In sum, organizing the data in this way facilitates comparison among similar types of items from a survey. This method also is useful for summarizing data for evaluations.

Figure 2. 

Figure 3. 

Create Indexes

Indexes and scales are useful data-reduction tools. Both are ordinal level measures that are composites of two or more questionnaire items. Indexes are more commonly used than scales and are the simple cumulation of scores that are assigned to the response categories for a group of items in a survey (Babbie, 1973).2 That is, the index score is the sum of the scores of the items that comprise the index.

Item Selection

Items for an index should have face validity. That is, each item should measure, more or less, the concept encompassed by the index. For example, nine items have been selected for an index to measure (pre- and post- program) adoption of best management practices (BMPs) for irrigation systems as part of an evaluation (Figure 4). The content of these items has a logical consistency in that each item is relevant to irrigating residential landscapes.

Figure 4. 

Relationships Among Items

Unidimensionality is a characteristic of valid indexes (Babbie, 1973). This means that the items in the index measure a single dimension or concept. How well a set of items in an index represents a single dimension depends on whether a concept is broadly or narrowly defined. Broadly defined concepts may not meet standards of an "acceptable" index as often as more narrowly defined ones because the former can encompass more than one aspect of the concept. In any event, items to be included in an index should be related to each other.

The relationships among items of an index can be examined first at the bivariate level and later at the multivariate level. Bivariate relationships can be viewed through tabular analysis (constructing two-way tables and calculating chi-square, phi, and related statistics) or correlations. Items in an index should show a pattern of moderately strong, positive associations, assuming that the items are scored in the same direction. The correlation matrix of the irrigation BMPs index illustrates this pattern (Figure 5).

Figure 5. 

The statistic Cronbach's alpha is often used to summarize the internal consistency of items included in an index (Carmines and Zeller, 1979). An alpha of .8 is generally considered to indicate good internal consistency for an index, although an alpha of .6 may be acceptable for exploratory research. The alpha for the items in Figure 4 is .806, indicating good internal consistency in the index.

Unidimensionality of items in an index can be assessed at the multivariate level using confirmatory factor analysis. This statistical method provides a rigorous test of an index. However, this techniques is complex and requires expertise beyond that of most non-statisticians. Interested readers should consult J. Scott Long's (1983) Confirmatory Factor Analysis.

Using Indexes

Once a satisfactory index is created, a frequency table or other summary statistics can be calculated. As illustrated by the irrigation BMP index, a single frequency table can summarize information about nine items from the questionnaire (Figure 6). An index can be analyzed in the same way as any other single variable. This process is more fully outlined in Phases of Data Analysis, PEOD-1 (Israel, 1992).

Figure 6. 

When a community needs assessment survey is being analyzed, one might first focus on the items related to economic development and then examine a second or third topic later. Within a single content area, the response patterns of the items are compared to identify similarities or differences. For example, if one item is formatted to ask how great is a need using 5 response categories and another item asks if something should be done using a yes/no format, then one comparison can be based on the percent of "great need" (or the combined percentage of "great" and "some need") responses for the first item to that of "yes" for the second item. Applying this strategy to interpreting the items in Figure 7, one might conclude that the first indicates a larger problem than the second. This strategy provides of means of making rough comparisons among survey items which have dissimilar response categories.

Figure 7. 

Ransack Data To Identify Relationships

To this point, the analysis of survey data has focused on describing items on a questionnaire individually or by comparison with other items. Much more can be learned about data through the relationships between items. One common practice in the analysis of survey data is to look for relationships between specific topics and the socio-demographic characteristics of respondents. Identifying these relationships for a needs assessment survey can help identify segments of the population with unique sets of needs. Similarly, identifying relationships between specific topics and the socio-demographic characteristics of respondents for an evaluation can help identify which segments of program participants changed the most, and this information can lead to ideas for improving the delivery and impact of a program.

The procedures for examining relationships between specific topics and respondent characteristics are sometimes called ransacking the data. This process begins by calculating two-way tables (cross-tabulations) and the accompanying statistics (Chi-square and associated probability level) for each pair of items. Obviously, a large number of tables will be generated for most surveys. By reviewing the Chi-square statistic and associated probability level for each table, relationships which meet the criteria for statistical significance can be identified.3 Next, patterns among the items showing significant relationships can be identified. For example, if education is found to be associated with a number of different needs in a community but age is not, then further attention can be focused on the former. In this example, a researcher would then describe how education is related to the set of items.

Elaborate Relationships

The final and most complex phase of the data analysis involves examining the relationships between a substantive item and a demographic item while controlling for the effects of other items. An analysis which shows a significant effect when controlling for other factors further increases confidence in conclusions about program need or program impact.

This final phase of the analysis involves elaborating on the relationship between survey items. In essence, elaboration helps us clarify the relationship between two items because we take into consideration the context or environment of that relationship. The elaborating process is more fully explained in "Elaborating Program Impacts Through Data Analysis" (Israel, 1992). Analysis in this phase involves the use of either tabular analysis or multi-variate statistical techniques, such as regression, analysis of covariance, and logistic regression.

Endnotes

  1. A number of statistical software packages can be licensed or purchased for use on a personal computer, including SAS, SPSS, Number Cruncher and others. Many database and spreadsheet programs, such as dBase, Quatro Pro, Lotus 1-2-3, etc., also can be use for data analysis but many of these programs do not have the full range of analytic procedures contained in the statistical packages.

  2. Scales differ from indexes by including measurement of the intensity structure among individual items (Babbie, 1973). In a scale, the response of a person on one item implies how that person will respond to others. For example, given a scale for evaluating skills, if a person learned a complex practice during an Extension program, that person would also be expected to have learned simpler practices (but not necessarily the most complex ones). An index may only count the number of skills which were learned without reference to the complexity each practice.

  3. A small probability level should be used in assessing statistical significance for a large number of tables. This level should be .01 or .001 (the latter is more rigorous). A small probability level is used because the risk of attributing significance to a relationship which is due solely to chance increases with the number of tables that are constructed. Thus, if a .01 level of significance is used, one table out of every 100 would be significant on the basis of chance and not due to a "real" relationship between the two items.

References

Babbie, Earl R. 1973. Survey Research Methods. Belmont, CA: Wadsworth Publishing Company, Inc.

Carmines, Edward G. and Richard A. Zeller. 1979. Reliability and Validity Assessment. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-017. Beverly Hills: Sage Publications.

Israel, Glenn D. 1992. Phases of Data Analysis. Program Evaluation and Organizational Development, IFAS, University of Florida. PEOD-1. October.

Israel, Glenn D. 1992. Elaborating Program Impacts Through Data Analysis. Program Evaluation and Organizational Development, IFAS, University of Florida. PEOD-3, September.

Long, J. Scott. 1983. Confirmatory Factor Analysis: A Preface to LISREL. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-033. Beverly Hills: Sage Publications.

Footnotes

1.

This document is PEOD8, one of a series of the Agricultural Education and Communication Department, Florida Cooperative Extension Service, Institute of Food and Agricultural Sciences, University of Florida. Original publication date November 1992. Revised April 2009. Reviewed June 2012. Visit the EDIS website at http://edis.ifas.ufl.edu.

2.

Glenn D. Israel, associate professor, Department of Agricultural Education and Communication, and extension specialist, Cooperative Extension Service, Institute of Food and Agricultural Sciences (IFAS), University of Florida, Gainesville, FL 32611.


The Institute of Food and Agricultural Sciences (IFAS) is an Equal Opportunity Institution authorized to provide research, educational information and other services only to individuals and institutions that function with non-discrimination with respect to race, creed, color, religion, age, disability, sex, sexual orientation, marital status, national origin, political opinions or affiliations. For more information on obtaining other UF/IFAS Extension publications, contact your county's UF/IFAS Extension office.

U.S. Department of Agriculture, UF/IFAS Extension Service, University of Florida, IFAS, Florida A & M University Cooperative Extension Program, and Boards of County Commissioners Cooperating. Nick T. Place, dean for UF/IFAS Extension.