University of FloridaSolutions for Your Life

Download PDF
Publication #ENH1318

Small-to-Medium-Scale Sensory Evaluation of Horticultural Crops—Measuring Responses1

Sean Michael Campbell and Charles A. Sims2

Introduction

When measuring the responses of panelists, the main principle behind sensory evaluation, a variety of sensory tests can be used. In an ideal world, sensory panelists would be like machines, capable of being calibrated for maximum precision and accuracy. In reality, all of the factors that contribute to making people unique also cause them to experience sensory attributes differently. The study of these differences is called psychophysics and is a large area of focus for many university psychology departments. While this may seem like an abstract idea, these concepts have effects on our everyday lives. For example, maple syrup, a valuable agricultural commodity, can be rated based on color, clarity, and flavor, with the lighter Grade A varieties selling for a higher market value than the darker Grade B varieties (Figure 1).

Figure 1. 

Four grades (from left to right, Vermont Fancy, Grade A Medium Amber, Grade A Dark Amber, and Grade B) of maple syrup.


Credit:

Dvortygirl, Wikimedia Commons


[Click thumbnail to enlarge.]

This publication is the second in a series designed to assist producers in the small-to-medium-sized sensory evaluation of their horticultural crops, summarizing the types of sensory data available and their associated collection methods, with the guidelines outlined in this publication taken from the 4th edition of Sensory Evaluation Techniques (Meilgaard, Civille, and Carr 2016).

Types of Data

When measuring panelist responses, data can be taken in multiple ways but is commonly collected as nominal, ordinal, interval, or ratio data. Nominal data involves placing items into different, mutually exclusive categories that differ in name but otherwise do not provide a quantitative (numerical) value or follow any order. The top row of Figure 2 shows images of hop (Humulus lupulus) cones from three different cultivars classified into three categories, but it otherwise offers little information on the relationships between them. Nominal data can be a useful tool for differentiating groups but lacks details about how these items relate or differ from each other.

Figure 2. 

Examples for the nominal, ordinal, interval, and ratio scales in hops (Humulus lupulus).


Credit:

Sean M. Campbell, UF/IFAS


[Click thumbnail to enlarge.]

Ordinal data also involves grouping items into distinct categories, but these categories belong to an ordered series. An example might be the degree of browning of hop cones, which can be placed into three categories: light, medium, and heavy. While the ordinal scale and its associated data are limited, they do carry information about relationships between categories.

Interval data is classified by items being placed into distinct categories, separated by a constant interval. In the example given in Figure 2, the three young hop plants are increasing in number of leaves from one to two to three, in each case increasing by an interval of one.

Finally, ratio data involves assigning numbers to indicate how a sample compares to a control (e.g., twice as strong, half as sweet, etc.). As illustrated in Figure 2, the sizes of three hop cones were analyzed; the first cone has a height that is ¾ as tall as the second, which is only ¾ as tall as the third.

Sensory Measurement Techniques

In addition to the different sensory data types available when performing sensory evaluation, there are also multiple categories for sensory measurement techniques. The most used are classification, grading, ranking, and scaling, with further descriptions listed below in order of increasing technique complexity.

Classification

Classification, the simplest of the sensory measurement techniques, involves placing items being evaluated into groups based on nominal data—that is, differing in name, but not based on numerical values or any sort of order. Most commonly, this involves asking panelists to pick a descriptor from a list that best describes an attribute about the sample in question, often by checking it with an X. Results are reported as the number of checks per response.

Because the data collected is nominal and has no numerical value attached, selecting the proper descriptors is crucial. Most consumers are unfamiliar with the complex descriptors used in sensory evaluation (sucrose, acidity, nondescript), so more common terms must often be used (sweet, sour, bland).

Another problem is a substitution caveat: if panelists find a defect in a product or a noticeable difference between products, and they cannot find an appropriate descriptor for it, they will often select a substitute term on the list to express this, which may not accurately reflect their perceptions. Therefore, it is important to make sure all possible options are provided, which can be done by researching the terms and expressions commonly used by your consumers or by looking up an existing list of appropriate terms created by a trained panel. The following questions in Figure 3 provide examples of nominal data collection:

Figure 3. 

Classification examples using nominal data.


Credit:

Sean M. Campbell, UF/IFAS


[Click thumbnail to enlarge.]

Grading

Grading involves assigning a value from a known scale to an item based on its performance, just as when assessing classroom assignments. In sensory terms, grading is the placing of items into ordinal data, or distinct groups belonging to an ordered series. Graders use this ordered series to consider all relevant sensory factors about an item before they give one overall rating, usually based off several grade levels, depending on the product. The distinction between grading and ranking is that grading is commonly done by those who learned the craft from others and can now be called expert graders.

Grading is routinely used in meat, dairy, coffee, and tea, and it is primarily a way of protecting the consumer from product substitution or adulteration. Figure 4 illustrates six grades commonly used to describe the degree to which coffee beans have been roasted. By enforcing a series of standards associated with different grades, consumers can be guaranteed that they are getting what they paid for. This also allows producers to earn more money for their products; as an item goes up in grade or as that grade becomes more associated with quality, retailers can charge a premium for that product. There are some drawbacks to the grading system; namely, it requires the use of expert graders trained in the appropriate method, something that takes time and money. For that reason, traditional grading scales are being replaced with more automated sensory evaluation techniques (e.g., mechanization, computer imaging).

Figure 4. 

Six coffee roasting grades. From top left to bottom right: light cinnamon, cinnamon, normal, French roasting, espresso and open fire.


Credit:

Godewind, Wikimedia


[Click thumbnail to enlarge.]

Ranking

Like grading, ranking involves placing samples in nominal format, usually based on the intensity or preference of an attribute in question. Subjects are typically given between three to seven samples, and after evaluating them, they are asked to arrange the samples in order based on the intensity of the attribute or preference. If the attribute in question is “Sweetness Intensity,” panelists would place the sweetest sample in spot 1, followed by the next sweetest at spot 2, and so on. Rank totals are calculated for each sample.

Ranking tests can be performed rapidly and with very little training, allowing for a wide array of applications. However, ranking tests cannot be used to provide intensity data (Sample A has 30% more intense sweetness than B). The following is an example of ordinal data collection using ranking

Figure 5. 

Ranking example using ordinal data.


Credit:

Sean M. Campbell, UF/IFAS


[Click thumbnail to enlarge.]

Scaling

Finally, scaling describes the process of comparing the sample to a predetermined scale, often based on numbers (1–10) or descriptive words (soft-hard). Several scaling method types are available and will yield different types of data, depending on which is used. This publication will focus on two types, line scaling and category scaling. Regardless of the method used, a common issue with scaling is the end-point bias. When using scales, panelists tend to preferentially use points toward the middle of the scale, saving the end points for extreme samples that may never come. Always make sure that your scale is large enough to describe the full range of experiences possible for the crop being tested, while still having enough points on the scale to be able to distinguish minor differences between samples.

Line scaling is the simplest of scaling methods, where panelists are given a line and asked to place an x or other mark on it to match the intensity of the crop or associated attribute. The line has anchors on either end, where commonly the left side stands for a low or zero value and the right end for a high or maximum value (Figure 6). The marks can then be converted to numerical values by measuring their location on the line with a ruler. This type of scaling method is popular because it is very easy for untrained panelists to understand, but without distinct points on the scale for the panelists to reference, user error is often high.

Figure 6. 

Line scaling example using interval data.


Credit:

Sean M. Campbell, UF/IFAS


[Click thumbnail to enlarge.]

Category scaling is likely one of the more recognizable forms of scaling; it involves asking panelists to rate the intensity of a crop or associated attribute by assigning it a value (category) on a limited scale (Figure 7). This is considered ordinal-level data; however, while the items being tested are placed into distinct groups that belong to an ordered series, it does not provide information on the degree that samples differ by. For example, on a 9-point scale, the difference between a 2 and a 5 rating might not be the same as a difference between a 5 and an 8, or a score of 4 might not indicate a value half as much as a score of 8. Another very popular category scale is the 9-point hedonic scale commonly used to measure overall acceptability of a food. The scale ranges from 1, dislike extremely, to 5, neither like nor dislike, to 9, like extremely.

Figure 7. 

Seven-point Likert category scaling example using ordinal and ratio data.


Credit:

Sean M. Campbell, UF/IFAS


[Click thumbnail to enlarge.]

Summary

To accurately measure consumer perceptions of fruits and vegetables, data can be taken in a variety of forms, increasing in the relative amount of information that can be determined about the relationships between groups. Nominal data involves placing items into different, mutually exclusive categories that differ in name, but otherwise do not provide a quantitative (numerical) value or follow any order. Ordinal data also involves grouping items into distinct categories, but instead these categories belong to an ordered series. Interval data is classified by items being placed into distinct categories, separated by a constant interval. Finally, ratio data involves assigning numbers to indicate how a sample compares to a control (e.g., twice as strong, half as sweet, etc.).

To collect this sensory data, multiple sensory measurement techniques can be used, each resulting in a different type of data. Classification, the simplest of the sensory measurement techniques, involves placing items being evaluated into groups based on nominal data—that is, differing in name, but not based on numerical values or any sort of order. Grading involves assigning a value from a known scale to an item based on its performance, an example of ordinal data, or distinct groups belonging to an ordered series. Ranking also involves placing samples in nominal format, usually based on the intensity or preference of an attribute in question. Subjects are typically given between three to seven samples, and after evaluating them, they are asked to arrange the samples in order based on the intensity of the attribute or preference. Finally, scaling describes the process of comparing the sample to a predetermined scale, often based on numbers (1–10) or descriptive words (soft-hard).

Reference

Meilgaard, M. C., G. V. Civille, and B. T. Carr. 2016. Sensory Evaluation Techniques. Boca Raton: CRC Press.

Footnotes

1.

This document is ENH1318, one of a series of the Environmental Horticulture Department, UF/IFAS Extension. Original publication date May 2020. Visit the EDIS website at https://edis.ifas.ufl.edu for the currently supported version of this publication.

2.

Sean Michael Campbell, doctoral research assistant, Environmental Horticulture Department, UF/IFAS Mid-Florida Research and Education Center; and Charles A. Sims, professor, Food Science and Human Nutrition Department; UF/IFAS Extension, Gainesville, FL 32611.


The Institute of Food and Agricultural Sciences (IFAS) is an Equal Opportunity Institution authorized to provide research, educational information and other services only to individuals and institutions that function with non-discrimination with respect to race, creed, color, religion, age, disability, sex, sexual orientation, marital status, national origin, political opinions or affiliations. For more information on obtaining other UF/IFAS Extension publications, contact your county's UF/IFAS Extension office.

U.S. Department of Agriculture, UF/IFAS Extension Service, University of Florida, IFAS, Florida A & M University Cooperative Extension Program, and Boards of County Commissioners Cooperating. Nick T. Place, dean for UF/IFAS Extension.