What is the validity of the sorting task for describing beers - A study using trained and untraind assessors.pdf - Artykuły - janocz

Food Quality and Preference 19 (2008) 697–703

Contents lists available at ScienceDirect

Food Quality and Preference

journal homepage: www.elsevier.com/locate/foodqual

What is the validity of the sorting task for describing beers? A study using

trained and untrained assessors

Maud Lelièvre a,b, * , Sylvie Chollet a , Hervé Abdi c , Dominique Valentin b

a Institut Supérieur d’Agriculture, 48 Boulevard Vauban, 59046 Lille Cedex, France

b UMR CSG 5170 CNRS, Inra, Université de Bourgogne, 21000 Dijon, France

c The University of Texas at Dallas, Richardson, TX 75083-0688, United States

article info

abstract

Article history:

Received 30 August 2007

Received in revised form 9 May 2008

Accepted 9 May 2008

Available online 15 May 2008

In the sensory evaluation literature, it has been suggested that sorting tasks followed by a description of

the groups of products can be used by consumers to describe products, but a closer look at this literature

suggests that this claim needs to be evaluated. In this paper, we proposed to examine the validity of the

sorting task to describe products by trained and untrained assessors. The experiment reported here con-

sisted in two parts. In a ﬁrst part, participants sorted nine commercial beers and then described each

group with their own words or with a list of terms. In a second part, participants were asked to match

each beer with one of their own sets of descriptors. The matching task was used to evaluate the validity

of the sorting task to describe products. Results showed that (1) the categories of trained and untrained

assessors were comparable, (2) trained and untrained assessors did not describe groups of beers similarly,

(3) for both groups, the results of matching task were not very good and presented a high inter-variabil-

ity, and (4) providing a list of terms did not seem to help the assessors. Overall, the results suggest that

the sorting task followed by a description does not seem to be adapted for a precise and reliable descrip-

tion of complex products such as beers but may be an interesting tool to probe assessors’ perception.

Keywords:

Sorting task

Description

Experts

Consumers

Beer

DISTATIS

Matching task

1. Introduction

an easy and rapid method for obtaining perceptual maps of a large

set of products, even with untrained participants.

Some authors proposed to go one step further by adding a

description phase to the sorting task in order to describe the prod-

ucts ( Blancher et al., 2007; Cartier et al., 2006; Faye et al., 2004;

Faye et al., 2006; Lawless et al., 1995; Lim & Lawless, 2005;

Saint-Eve, Paçi Kora, & Martin, 2004; Tang & Heymann, 1999 ).

So after they have sorted their products, participants are asked to

describe each group with words, which are then projected onto

the perceptual map of the products. Using this procedure Faye

et al. (2004) studied the visual description of plastic pieces and

compared the results of a free sorting task with description per-

formed by consumers to a sensory proﬁle performed by experts.

These authors found that the conclusions reached with these two

methods were quite similar for the product conﬁgurations and

the words used to describe the products. Likewise, Faye et al.

(2006) showed that the MDS positioning of leather samples ob-

tained from a sorting task with description performed by consum-

ers on visual and tactile characteristics was comparable to the

sensory proﬁle of experts. Moreover, these authors found that con-

sumers and experts were providing related descriptions. However,

these two studies involved non-food products and their results

might not generalize to food products. In fact, the authors suggest

that their results were speciﬁc to the case of visual and tactile

senses and that their samples were easy to differentiate. In the

The sorting task is a simple procedure for collecting similarity

data in which participants group together stimuli based on their

perceived similarities. It is based on categorization which is a nat-

ural cognitive process routinely used in everyday life, and it does

not require a quantitative response. This method has been rou-

tinely used by psychologists since the 1970s (e.g., Coxon, 1999;

Healy & Miller, 1970 ). In the sensory domain, sorting tasks were

ﬁrst used to investigate the perceptual structure of odors ( Chrea

et al., 2005; Lawless, 1989; Lawless & Glatter, 1990; MacRae,

Rawcliffe, Howgate, & Geelhoed, 1992; Stevens & O’Connell,

1996 ). Lawless, Sheng, and Knoops (1995) were the ﬁrst to use a

sorting task with a food product (cheese). Today, a large variety

of products (food or non food) have been studied with this method

(see Abdi, Valentin, Chollet, & Chrea, 2007 , for a review). Results of

sorting tasks are generally analyzed using multidimensional scal-

ing (MDS) or variation of this method (e.g., distatis, Abdi et al.,

2007 ), or sometimes with additive trees ( Abdi, 1990; Corter,

1996 ). Generally, authors using the sorting task report that it is

* Corresponding author. Address: Institut Supérieur d’Agriculture, 48 Boulevard

Vauban, 59046 Lille Cedex, France. Tel.: +33 3 28 38 48 01; fax: +33 3 28 38 48 47.

E-mail addresses: m.lelievre@isa-lille.fr , lelievremaud@yahoo.fr (M. Lelièvre).

doi:10.1016/j.foodqual.2008.05.001

698

M. Lelièvre et al. / Food Quality and Preference 19 (2008) 697–703

food domain, the most recent study comparing a sorting task and a

descriptive analysis method is reported in Blancher et al. (2007) .In

this study, a conventional proﬁle of visual appearance and texture

of jellies was compared to a sorting task with description and a

Flash proﬁle which combined the free choice proﬁling and a com-

parative evaluation of all the products ( Dairou & Sieffermann,

2002; Delarue & Sieffermann, 2004 ). The authors found that the

Flash proﬁle and the sorting task provided sensory maps similar

to those of conventional proﬁle for both a French and a Vietnamese

panels but that the conﬁgurations obtained with the conventional

proﬁle were more similar to the conﬁgurations obtained with the

Flash proﬁle than to those obtained with the sorting task. Another

recent paper from Cartier et al. (2006) showed similar results be-

tween a quantitative descriptive analysis and a sorting task with

description on breakfast cereals. In this work, trained assessors

performed a quantitative descriptive analysis on a set of 14 com-

mercial breakfast cereals by rating 22 attributes of texture and ﬂa-

vor. Then, the same trained assessors and a group of untrained

assessors performed a sorting task on the same set of breakfast

cereals followed by a description of their groups of products. The

authors found that products were grouped similarly in the MDS

conﬁgurations derived from the sorting task and in the principal

component analysis conﬁgurations derived from the sensory pro-

ﬁle. Products were described with more terms in the sensory pro-

ﬁle than in the sorting task and even though many terms were

common to both methods, the descriptions of the groups of prod-

ucts were not exactly the same, especially for untrained assessors.

The authors concluded that the sorting task associated with a

description is a time-effective alternative to the quantitative

descriptive analysis because the sorting task can provide a rough

description of a large set of products. Nevertheless, some critical

points emerge from a careful reading of the literature.

Several works comparing trained and untrained assessors on

categorization tasks reveal that the untrained assessors’ descrip-

tions are not always comparable to the experts’ descriptions. Actu-

ally, many authors report that trained assessors tend to be more

efﬁcient in their description than untrained assessors. For example

Soufﬂet, Calonnier, and Dacremont (2004) found that experts

showed better abilities than untrained assessors in verbalizing

their haptic perceptions of fabrics. In the food domain, Lawless

et al. (1995) found that several attributes used to describe groups

of cheeses were signiﬁcant when regressed through the MDS space

but that cheese expert assessors had a larger number of signiﬁcant

attributes. Saint-Eve et al. (2004) —writing about yoghourts—as

well as Lim and Lawless (2005) —writing about taste solutions—

found that some consensus in description was possible but all

these authors also showed that untrained assessors did not agree

on the verbal labeling of the groups of products and that several

of their terms were idiosyncratic. Along the same line, Piombino,

Nicklaus, Le Fur, Moio, and Le Quéré (2004) underlined the heter-

ogeneity of the criteria used by assessors to characterize their

groups of wines. The authors explained that among other reasons,

this heterogeneity could be linked to a lack of training in the iden-

tiﬁcation and description of odors. Moreover, it has been already

shown with other sensory methods, such as matching or descrip-

tion tasks, that the attributes generated by consumers are more

ambiguous, redundant and less speciﬁc than the attributes gener-

ated by trained assessors ( Chollet & Valentin, 2001; Chollet &

Valentin, 2006; Chollet, Valentin, & Abdi, 2005; Clapperton & Pig-

gott, 1979; Gains & Thomson, 1990; Guerrero, Gou, & Arnau,

1997; Sokolow, 1998; Solomon, 1990 ).

Another aspect never addressed in the literature is the difﬁculty

to analyze the vocabulary used by assessors—especially consum-

ers—to describe their groups of products. In fact, in all the studies

using a sorting task, the number of terms quoted by the assessors

was very large and the descriptions varied a lot from one untrained

assessor to the other. Moreover, assessors spontaneously qualiﬁed

their attributes with some various quantitative terms such as

‘‘very,” ‘‘many,” ‘‘slightly,” etc. So it is often necessary to preprocess

the attributes before projecting them onto the MDS maps by cate-

gorizing similar terms, eliminating hedonic and idiosyncratic

terms and keeping only terms cited by more than a few assessors

( Cartier et al., 2006; Faye et al., 2004; Faye et al., 2006; Soufﬂet

et al., 2004 ). This preprocessing requires time and can lead to a loss

of information because it depends upon the subjectivity of the sen-

sory analyst.

In the literature, the sorting task associated with a description

performed by untrained assessors is presented as an interesting

descriptive tool but is this method really valid for describing prod-

ucts? In order to be used for different industrial applications, the

information from product descriptions has to be clearly interpret-

able and valid. If a description reﬂects the sensory properties of a

given product then this product should be matched to this descrip-

tion. In this study, we were interested in examining the validity of

the product descriptions obtained via a sorting task associated

with a description. Trained and untrained assessors performed a

sorting task with description followed by a matching task on nine

commercial beers. The technique of matching has been already

used by several authors, especially in wine domain, to evaluate

expert descriptions. Lehrer (1975) , followed by Lawless (1984)

reported that experts were not really better in matching descrip-

tions than untrained assessors. In contrast, Solomon (1990) found

that experts clearly outperformed untrained assessors whereas

Gawel (1997) showed that untrained experienced assessors were

able to outperform trained experienced assessors when they

matched consensual expert descriptions. In beer domain, Chollet

and Valentin (2001) found that trained and untrained assessors

performed the matching task equally well, even if trained assessors

were better on supplemented beers and untrained ones on com-

mercial beers. In this study, the matching task was used to test

the validity of the sorting task to describe beers as it was already

done for the quantitative descriptive proﬁle ( O’Neill et al., 2003;

Sauvageot & Fuentès, 2000 ). The validity of the sorting task was

studied in a condition where assessors freely described their

groups and in a condition where assessors had to choose their

terms from a list ( Hughson & Boakes, 2002; Lawless, 1988 ). By

using these two conditions, we wanted to test if the use of a list

of terms could help assessors, especially untrained assessors, to

provide more relevant descriptions of beers.

Table 1

List of the 44 terms used for the second condition (from Meilgaard et al., 1979 )

1. Alcoholic

23. Sulﬁdic

2. Solvent like

24. Cooked vegetable

3. Estery

25. Yeast

4. Fruity

26. Stale

5. Acetaldehyde

27. Catty

6. Floral

28. Papery

7. Hoppy

29. Leathery

8. Resinous

30. Moldy

9. Nutty

31. Acidic

10. Grassy

32. Acetic

11. Grainy

33. Sour

12. Malty

34. Sweet

13. Worty

35. Salty

14. Caramel

36. Bitter

15. Burnt

37. Alkaline

16. Phenolic

38. Mouthcoating

17. Fatty acid

39. Metallic

18. Diacetyl

40. Astringent

19. Rancid

41. Powdery

20. Oily

42. Carbonation

21. Sulfury

43. Warming

22. Sulﬁtic

44. Body

M. Lelièvre et al. / Food Quality and Preference 19 (2008) 697–703

699

2. Material and methods

Because we had only one group of trained assessors, we used a

within-subject design (all trained assessors performed the experi-

ment in the two conditions without and with the list of terms)

whereas for untrained assessors, we used a between-subject

design (group A performed the task in the condition without the

list and group B in the condition with the list). In both conditions

(without and with the list), assessors were told to use no more

than ﬁve words per group of beers and to indicate the intensity of

the descriptors using a four-point scale labeled: ‘‘not,” ‘‘a little,”

‘‘medium” and ‘‘very.” Assessors did not know that they would

have to describe their beer groups when they performed the sort-

ing task. Also, they could not change the beer groups they had just

made.

Part 2. Matching task: After a 20-min break, assessors received

the nine beers again and were provided with the sets of terms they

had just used to describe their beer groups. They were not in-

formed that the beers were the same that the ones used for the

sorting task. They were asked to match each beer with a set of

terms. The instructions indicated that one beer could be associated

with only one set of descriptive terms and that assessors were not

obliged to use all the sets of terms (some sets of terms could be

associated with no beer). When they performed the sorting task,

assessors did not know that they would have to match their

descriptions later on.

2.1. Assessors

2.1.1. Trained assessors

Thirteen assessors (5 women and 7 men) aged between 25 and

53 years (mean age = 34.9 years, SD = 9.2 years) participated.

Assessors were staff members from the Catholic University of Lille

(France). They had been trained one hour per week for two to ﬁve

years (depending on the assessors, mean = 3.4 years, SD = 1.6

years) to detect and identify ﬂavors (almond, banana, butter,

caramel, cabbage, cheese, lilac, metallic, honey, bread, cardboard,

phenol, apple, and sulﬁte) added in beer and to evaluate, using a

non-structured linear scale, the intensity of general compounds

(bitterness, astringency, sweetness, alcohol, hop, malt, fruity, ﬂoral,

spicy, sparklingness, and lingering).

2.1.2. Untrained assessors

Two different groups of untrained assessors who were students

and staff members of the University of Bourgogne (France) partic-

ipated. Group A consisted of 19 assessors (6 women and 13 men)

aged between 22 to 56 years (mean age = 26.6 years, SD = 8.0

years). Group B consisted in 18 assessors (19 women and 9 men)

aged between 21 and 31 years (mean age = 24.6 years, SD = 2.4

years). They were beer consumers but did not have any formal

training or experience in the description of beers.

2.4. Data analysis

2.2. Products

2.4.1. Sensory map of the products

For each assessor, the results of the sorting task were encoded

in an individual distance matrix where the rows and the columns

are beers and where a value of 0 between a row and a column indi-

cated that the assessor put the beers together, whereas a value of 1

indicated that the beers were not put together. For each group of

assessors (trained and untrained group A and B) and each condi-

tion (without and with the list), the individual distance matrices

obtained from the sorting data were analyzed by using Distatis

( Abdi, Valentin, O’Toole, & Edelman, 2005; Abdi et al., 2007 ). This

method is a generalization of classical multidimensional scaling.

Distatis takes into account individual sorting data and it provides

a compromise map for the products which is a MDS-like map. This

product map is obtained from a principal component analysis per-

formed on the distatis compromise cross-products matrix which is

a weighted average of the cross-products matrices associated with

the individual distance matrices derived from the sorting data

( Abdi et al., 2007 ). In this map, the proximity between two points

reﬂects their similarity. We also computed R v coefﬁcients between

trained and untrained assessors’ conﬁgurations in the two condi-

tions with and without list. The R v coefﬁcient measures the simi-

larity between two conﬁgurations and can be interpreted in a

manner analogous to a squared correlation coefﬁcient ( Abdi, 2007 ).

Nine different commercial beers were evaluated (denoted Pel-

fBL, PelfA, PelfBR, ChtiBL, ChtiA, ChtiBR, LeffBL, LeffA and LeffBR).

These beers came from three different breweries: Pelforth (noted

Pelf), Chti (Chti) and Leffe (Leff) and each brewery provided three

types of beer: blond (BL), amber (A) and dark (BR). All beers were

presented in three-digit coded black plastic tumblers and served at

10 C.

2.3. Experiment

Subjects took part individually in the experiment in a single ses-

sion. The experiment was conducted in separate booths lighted

with a neon lighting of 18 W with a red ﬁlter darkened with black

tissue paper to mask the color differences between beers. Mineral

water and bread were available for assessors to rinse between

samples. Assessors could spit out beers if they wanted.

The experiment consisted in two parts. The ﬁrst one was a

sorting task and the second a matching task. These two parts are

explained below.

Part 1. Sorting task with description: The assessors received the

entire set of beers. The order of presentation of the samples was

performed according to a Latin Square. Panelists were ﬁrst

required to smell and taste each sample once in the proposed or-

der. Afterward, they were allowed to smell and taste samples as

many times as they wanted and in any order. No criterion was

provided to perform the sorting task. Assessors were free to make

as many groups as they wanted and to put as many beers as they

wanted in each group. They were allowed to take as much time

as they wanted. After they had ﬁnished their sorting task, the

assessors were asked to describe each group of beers with some

words according to two conditions. In the ﬁrst condition, asses-

sors were free to use their own words. In the second condition,

assessors had to choose their words from a list of 44 terms which

were extracted from the Flavor Wheel of the International Termi-

nology System for Beer ( Meilgaard, Dalgliesh, & Clapperton, 1979 )

(see Table 1 ).

2.4.2. Analysis of the vocabulary

Each assessor described each group of beers with words. For

each assessor, the terms given for a group of products were associ-

ated to each beer of the group. We assumed that all the beers

belonging to the same group were described by the terms in the

same way. We began by regrouping the synonyms. Then we con-

verted each intensity word into a score in order to obtain an inten-

sity score for each term quoted to describe the groups of beers:

‘‘not” = 0, ‘‘a little” = 1, ‘‘medium” = 2 and ‘‘very” = 3. Then, in order

to analyze the vocabulary used by trained and untrained assessors,

we computed the geometric mean for each quoted term and each

beer for trained and untrained assessors as described in Dravieks

(1982)

M ¼

F I

700

M. Lelièvre et al. / Food Quality and Preference 19 (2008) 697–703

where F is the frequency of quotation of each term and is calculated

by dividing the number of times when the term was quoted with an

intensity different from zero by the maximum number of quota-

tions for a term (number of assessors); I is the intensity for each

quoted term and is computed as the sum of the intensities for the

term divided by the maximal intensity for a term (number of asses-

sors by maximum score for a term). The geometric mean is

expressed as a percentage. Only terms having a geometric mean

higher or equal to 20% for at least one product were considered.

The geometric means of these terms were then projected onto the

compromise spaces for trained and untrained assessors in the two

conditions (without and with the list), according to the method de-

scribed in Abdi et al. (2007) .

3.2. How did trained and untrained assessors describe the groups of

beers?

3.2.1. Expertise level effect

Without any list of terms, we clearly observe a larger number of

descriptors with a geometric mean above 20% for trained asses-

sors: there were only three terms out of 54 with a geometric mean

higher than 20% for untrained assessors, while there were eight out

of 35 for trained assessors. The terms fruity and bitter were

common to the descriptions of the two groups of assessors but only

bitter was used to describe the same beers (Leffe beers). Globally,

the descriptions of the groups of beers were different for trained

and untrained assessors without the list. In the condition with

the list, the number of descriptors was quite similar for trained

(10 terms out of 27) and untrained assessors (9 terms out of 34)

and seven terms were common to their descriptions (malty, sweet,

burnt, bitter, caramel, alcoholic and fruity). Only bitter (for the three

Leffe beers) and fruity (for LeffBL) were used to describe the same

beers for the two groups of assessors.

2.4.3. Evaluation of the validity of the vocabulary

To study the validity of the vocabulary used by trained and

untrained assessors to describe their groups of beers, we examined

the results of the matching task. We assumed that if assessors were

able to make the same groups of beers from their descriptions as

they did during the sorting task, then the terms they used to

describe their groups of beers were valid. We computed the num-

ber of correct matches, which corresponds to the number of times

a beer was matched with the right description written during the

sorting task. For convenience, the results are expressed as the per-

centage of correct matches. We computed Student t-tests between

the means of the percentages of correct matches for the assessors

and the means of the percentages of correct matches expected by

chance. The percentage of correct matches to be expected by

chance was different for each assessor because the number of

descriptions differed from one assessor to another, depending on

the number of sorting groups. This percentage for an assessor

was computed as: (1/number of descriptions of the asses-

sor) 100. In order to study the effect of training (trained/

untrained) and the use of a list of terms (without/with the list)

on the validity of the vocabulary, Student t-tests were also per-

formed on the means of the percentages of correct matches. Differ-

ences are considered signiﬁcant at alpha = 0.05 level.

3.2.2. List effect

If we compare the two conditions without and with the list for

trained assessors, we ﬁnd some common points: the terms alcohol,

sweet, bitter, caramel, ﬂoral and fruity were common to both

descriptions. In the two conditions, trained assessors described

Leffe beers as sweet, fruity, bitter and caramel. However, we can

note some differences. For example, trained assessors character-

ized ChtiBL with the term butter only in the condition without

the list. Also, they described PelfA with ﬂoral without the list and

with astringent and alcohol with the list. Along the same line,

ChtiBR was characterized using the attribute coffee without the list

and as metallic and malt with the list. Concerning untrained asses-

sors, we observe that they used many more terms with the list

than without the list. For example with the list, they described

beers with terms such as hop, malt, caramel, alcoholic, burnt, sweet,

or smooth. Two terms were common to the two descriptions with-

out and with the list: bitter and fruity, but only bitter characterized

the same beers in the two conditions (Leffe beers). Moreover, a

more detailed analysis of the raw data shows that the terms hop

and malt were used by untrained assessors to describe all of the

nine beers whereas trained assessors never used hop to describe

the beers and malt was only used for ChtiBL.

3. Results

Fig. 1 shows the compromise maps obtained for trained and un-

trained assessors’ sorting results. Terms (only the ones with a geo-

metric mean higher or equal to 20%) are plotted onto these maps

for the two conditions without and with the list.

3.2.3. Quantitative terms

We examined how trained and untrained assessors used the

four quantitative words: ‘‘not”, ‘‘a little”, ‘‘medium” and ‘‘very”.

We found that trained assessors used the words ‘‘very” twice as of-

ten as ‘‘a little.” In contrast, untrained assessors used the three

terms ‘‘a little,” ‘‘medium” and ‘‘very” in a similar way. Moreover,

untrained assessors used the word ‘‘not” to characterize their

descriptors more frequently (20 times) than trained assessors

(5 times) did ( v 2 = 9, d.f. = 1, p < 0.01).

3.1. How did trained and untrained assessors categorize beers?

As shown in Fig. 1 , on the whole, trained and untrained asses-

sors categorized the nine beers in the same way. These observa-

tions were conﬁrmed by the large values of R v coefﬁcients

computed between trained and untrained assessors’ conﬁgura-

tions which were signiﬁcant for the two conditions without

(R v = 0.71, p < 0.05) and with the list of terms (R v = 0.65,

p < 0.05). There is a clear separation of the beers into breweries.

The three Chti beers are opposed to the three Leffe beers on the

ﬁrst dimension which explained 44% of the total variance. The

three Pelforth beers are a little less well clustered. They are

spread between the Chti and the Leffe beers on the ﬁrst axis. They

are opposed to the Chti and Leffe beers on the second dimension

for untrained assessors and are more mixed with the two other

breweries for trained assessors. However these differences

between trained and untrained assessors for the Pelforth beers

should be interpreted with caution since axis 2 only explains a

relatively small amount of total variance (12% for trained and

9% for untrained assessors).

3.3. What is the validity of the terms used by trained and untrained

assessors?

Student t-tests showed that the results of trained assessors

were signiﬁcantly better than chance when assessors matched

their descriptions for the two conditions (Average (without the

list) = 54.7%, t(12) = 2.82, p < 0.01; Average (with the list) = 59.0%,

t(12) = 4.39, p < 0.001), as well as the results of untrained assessors

(Average (without the list, group A) = 50.9%, t(18) = 4.49, p < 0.001;

Average (with the list, group B) = 48.1%, t(17) = 4.10, p < 0.001).

Student t-tests did not detect a difference between the two con-

ditions without and with the list for trained assessors (t(12) = 0.50,

ns), and for untrained assessors (t(35) = 0.36, ns). In the same way,

M. Lelièvre et al. / Food Quality and Preference 19 (2008) 697–703

701

Fig. 1. Two dimensional compromise maps for trained assessors (top panel) and untrained assessors (bottom panel) for their sorting tasks followed by descriptions without

the list (on the left) and with the list (on the right). The geometric means of each term are plotted onto the compromise spaces.

there was no statistically signiﬁcant difference between the two

groups of assessors in the condition without the list (t(30) = 0.36,

ns) as well as in the condition with the list (t(29) = 1.28, ns). So

there was no statistically signiﬁcant difference on the validity of

the vocabulary neither between trained and untrained assessors

nor between the two conditions (without/with the list). However,

this failure to show any signiﬁcant effect can be explained by the

large inter-individual variability of the results.

Fig. 2 shows the box plot of the distributions of the percentage

of correct matches for trained and untrained assessors in the two

conditions (without and with the list). The box extends from the

ﬁrst to the third quartile, the line across the box represents the

median, the plus sign represents the mean value and the ends of

the lines extending from the box (‘‘whiskers”) indicate the maxi-

mum and the minimum data values, unless outliers are present

in which case the whiskers extend to a maximum of 1.5 times

the inter-quartile range (i.e. length of the box). In our case, the

whiskers represent the extreme values. We can see a high inter-

individual variability especially for trained assessors in the condi-

tion without the list. A ﬁner grained analysis of the raw data shows

that three trained assessors perfectly succeeded in the matching

task (percentage of correct matches = 100%) and two trained asses-

sors did not succeed at all in associating the beers with their

descriptions (percentage of correct matches = 0%).

4. Discussion

In recent years, using sorting tasks associated with a description

with consumers has started to become a popular way of describing

food and non-food products. This approach proved to be useful to

obtain a coarse description of products ( Blancher et al., 2007; Car-

tier et al., 2006; Faye et al., 2004; Faye et al., 2006; Saint-Eve et al.,

2004; Tang & Heymann, 1999 ) but can it be considered as a plau-

sible alternative to conventional proﬁling? The information con-

veyed by products descriptions has numerous applications in

product development, quality control or consumer preference

understanding. Thus, because of these important and widespread

applications, the information conveyed by products descriptions

needs to be clearly interpretable, reliable and valid. To this extent,

a product description should convey the sensory properties of the

product it represents in such a way that a product can be matched

to its corresponding description. In this study, we examined if

product descriptions obtained via a sorting task associated with a

description could match this requirement. We compared the per-

formance of trained and untrained assessors in two description

conditions (without and with a list of terms).

Fig. 2. Box plot of percentage of correct matches distributions calculated for trained

and untrained assessors in the two conditions without (black boxes) and with the

list (white boxes), for the matching task.

What is the validity of the sorting task for describing beers - A study using trained and untraind assessors.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: