Open the MDS data-set in SPSS.

a. Using the ALSCAL program, assume that the data are metric (interval) and compute the distance matrix between pairs of sports for the 50 respondents with two-dimensional MDS

Don't use plagiarized sources. Get Your Custom Essay on

ALSCAL Program in SPSS Questions

Just from $13/Page

b. Which sports are the most similar according to consumer perceptions? Which ones are the most different?

c. Save the final co-ordinates and compute the bivariate correlations with the expert panel evaluations

d. Try the three-dimensional configuration. Which solution works better according to the STRESS value and R 2 ?

e. Repeat the analysis, this time looking for similarities between respondents for the ten sports on two-dimensions

f. How many clusters of consumers appear from the graph?

STATISTICAL MEASUREMENTS,

ANALYSIS AND RESEARCH

Instructor: Amreeta Choudhury

M4:2A Lecture 11 Part 1

Session Objectives

• Discuss Preferences, perceptions and multidimensional scaling

• Run multidimensional scaling

• Summarize principles and applications of correspondence analysis

• Theory and techniques of correspondence analysis

• Running correspondence analysis

• Use in Marketing including the following:

– Translate preference orderings and consumer perceptions toward products

(objects) into graphical representations, Perceptual maps, etc.

• Introduce Final Project

Multidimensional scaling

• a set of statistical techniques which allow one to

1. translate consumer preferences or perceptions towards

products or brands into a reduced number of dimensions

(usually two or three)

2. Represent them graphically into a preference map or

perceptual map

• It is also possible to show both objects and subjects (the

consumers) in the same graph through multidimensional

unfolding (MDU)

• MDU is a technique which unfolds the coordinates for

consumers (or groups of consumers) on the basis of their

preferences or perceptions through an ideal point model

3

Chapter 13

Chapter 13

Common Space

0.75

London

Paris

0.50

Berlin

0.25

Dimension 2

Interpretation: How trendy is the city

Example of MDS output – holiday destinations in two

dimensions

Amsterdam

Rome

0.00

Madrid

-0.25

Athens

Stockholm

-0.50

Bruxelles

-0.75

-0.5

0.0

0.5

Dimension 1

May be interpreted as “climate”

4

• Each of the respondents is asked to

rank the cities, without necessarily

specifying why one city was preferred to

another

• Similarities in ranking across an

adequate number of respondents reflect

perceived similarities between cities (e.g.

London is more similar to Berlin than to

Athens)

• Graph distances reflect dissimilarities

• If the two dimensions can be labelled

according to some criterion, as for

principal component or factor analysis,

then it becomes possible to understand

the main perceived differences.

Chapter 13

Marketing applications

• Sensory evaluation and new product development

• Example, a company developing a low-salt soup

An evaluation panel is asked to assess a set of existing soup brands according to

several criteria concerning taste, smell, thickness, storage duration, perceived

healthiness and price

Consumers are asked to identify their ideal product in terms of the same

characteristics which may not coincide with one of the existing soups

The final output is a perceptual map displaying both consumer preferences (in

terms of their ideal products) and the current positioning of the existing brand

A concentration of consumers’ ideal points identify a segment (cluster analysis

might also be used as a tool to segment respondents)

if no brands appear in the neighbourhood of a segment then there is room for the

development of a new product in that area

If the perceptual dimensions have been clearly identified this also allows one to

choose the characteristics of the new products.

5

Example of brand positioning

The two dimensions are the output of some

reduction technique

– PCA or FA for interval (metric) data

– correspondence analysis for non-metric

data

coordinates for brands are obtained by

running PCA (or FA) on sensory

assessments (usually through a panel of

experts unless objective measures exists)

Consumer positions (as individuals or as

segments) can be defined in two ways

1) using their “ideal brand” characteristics

2) by translating preference ranking for

brands into coordinates through unfolding

6

Chapter 13

Chapter 13

Brand positioning

The product should be

healthy as both A & C

like that dimension.

The thicker it is, the

closer is to C

compared to A

Segment A

chooses

three but it

is not that

close

Consumer segment B

is close to Brand three

Brand 1 and 4 are

perceived as similar

7

There is room for a new

product for segment C

also close to sgm. A

Brand 5

Brand five survives

because of segment

C, but it is far from C’s

preferences

Consumer segment D

is happy with Brand 2

Brand repositioning. If brand five had this marketing research information, one could

improve one’s performance by enhancing the perceived healthiness of the product

(e.g. reducing the salt content and through a targeted advertising campaign). This

would move brand fivcloser to segment C

Chapter 13

Other applications of MDS

• If consumer perceptions are compared through MDS before and after

an advertising campaign aimed at changing perceptions it becomes

possible to measure the success of the advertising effort

• Finally, MDS could be exploited to simplify data interpretation and

provide some prior insight before running psycho-attitudinal surveys.

8

Chapter 13

Running MDS

• MDS is a container for statistical techniques to produce perceptual

or preference maps.

• There is a range of options and choices depending on the type of

MDS data.

• object of the analysis: it can be a product, a brand or any other

target of consumer behaviour, like tourism destinations in the initial

example. The object can be depicted as a set of characteristics,

represented through

• objective dimensions (e.g. salt content in grams)

• subjective dimensions as declared by respondents (subjects) in a

survey

9

Preferences and perceptions

• With subjective dimensions, consumer evaluations can be based on

preferences or perceptions

• Measurement through preferences (preference map)

• the subjects rank several objects according their overall evaluations (e.g.

ordering of soup brands).

• Measurement through perceptions (perceptual or subjective

dimensions, perceptual map)

• the respondent must attach a subjective value to an object’s feature (e.g. a

rating of the thickness of each soup brand)

• When individual attribute perceptions are measured, respondents

may be asked to state the combination of an object’s features that

correspond to their ideal object (to be translated into an ideal point

in the spatial map).

• The ideal point can alternatively (and preferably) be derived

through an unfolding statistical model.

10

Chapter 13

Measurement

• Preferences

• rank order scales

• Q-sorting

• other comparative scales.

• Perceptions

• non-comparative scales

Likert

Stapel

Semantic Differential Scales.

• Two types of variables for MDS

• Non-metric variables just reflect a ranking, so that it is not possible to assess whether

the distance between the first and second object is larger or smaller than the distance

between the second and the third.

• Metric variables reflect respondent perception of the distances

• Generally, preference rankings are classified as non-metric and perceptions and

objective dimensions are metric.

• This distinction can be very important, as it leads to two different MDS

approaches.

11

Chapter 13

Chapter 13

Non-metric vs. metric MDS

• The output of non-metric MDS aims to preserve the preference

ranking supplied by the respondents

• Metric MDS also takes into account the distances as measured by

perceptions or objective quantities.

• This distinction is often overcome by the use of techniques which

allow one to transform non-metric variables and treat them as if they

were metric, like the PRINQUAL procedure in SAS or correspondence

analysis (see lecture 14)

12

Chapter 13

Multidimensional scaling steps

1.

4.

Decide whether mapping is based on an aggregate evaluation of

the objects or on the evaluation of a set of attributes

(decompositional versus compositional methods)

Define the characteristics of the data collection step (number of

objects, metric versus non-metric variables)

Translate the survey or objective measurements into a similarity

or preference data matrix

Estimate the perceptual map

5.

6.

7.

Decide on the number of dimensions to be considered

Label the dimensions and the ideal points

Validate the analysis

2.

3.

13

Chapter 13

Decompositional vs. compositional MDS

• Decompositional (attribute-free) approach

• The spatial maps reflect the subject evaluations

• Comparisons of the objects in their integrity

• Advantages: respondent assessment is easier, it is possible to obtain a separate

perceptual map for each subject or for homogeneous groups of subjects

• Limits: no specific information on the determinants of the relative position of the

objects. It is not possible to plot both the objects and the subjects in the same map. It

is difficult to label the dimensions (labels are based on the researcher’s knowledge

about the objects)

• Compositional (attributed-based) approach

• Subject assess es a set of attributes (compositional or attribute-based approach).

• Preferred when it is relevant to describe the dimensions and explain the positioning of

objects and subjects in the perceptual map

• Requirements: all the relevant attributes must be considered while avoiding including

irrelevant ones; the combination of attributes must be adequate to reflect the overall

object evaluation.

• The method to be used depends on the chosen approach

14

Objects and variables

• The higher the number of objects the more accurate the output of MDS in statistical

terms

• However, data quality suffers because it might be difficult for subjects to provide

large number of comparisons.

• The number of objects required for the analysis increases with the number of

dimensions being considered

• For two-dimensional MDS it is advisable to have at least ten objects

• For three-dimensional MDS it is advisable to have about fifteen objects

• As the number of objects increases goodness-of-fit measures become less reliable).

• Measurement through metric or non-metric variables

• The starting matrix for MDS is different

• With non-metric data (ordinal variables or paired comparison data) the initial data matrix

only considers ranking and not the distance between the objects

• With metric variables the matrix preserves the distances observed in the subject

evaluations.

• Most of MDS methods can also deal with mixed data-set with both metric and non-metric

data

15

Chapter 13

Data matrix

• Data for MDS are similarities between objects or preference (ranking) of

objects

• Decompositional approach: a matrix for each subject exists, which translates

into a matrix comparing all objects

• Compositional approach is chosen, a matrix for each subject and attribute

exists and this translates into a matrix comparing all objects for each

attribute

• Similarity data: the subject compares all pairs of objects and ranks the pairs

in terms of their similarity (usually this leads to non-metric MDS)

• The similarity (or dissimilarity) matrix can be also computed from metric

evaluation (rating) of the objects

• Compositional approach: summarize (e.g. through averaging) the distances between the

objects across the subjects, assuming all subjects have the same weight

• Decompositional approach: a synthetic measure of similarity between objects is computed

for each subject (weights can be used if available and appropriate) then the similarity

matrix across the subjects is derived

16

Chapter 13

Estimation

• Estimation starts from a proximity or similarity matrix and produces

a set of n-dimensional coordinates

• Distances in this n-dimensional space reflect as closely as possible

the distances recorded by the proximity matrix

• Metric scaling is based on a proximity matrix derived from metric

data

• Non-metric scaling projects dissimilarities based on ranking (ordinal

variables) preserving the order emerging from the subjects’

preferences

• Non-metric scaling should also be applied to metric distances when

the researcher suspects that collected data might be affected by

relevant measurement errors (e.g. when respondents may

encounter difficulties in stating their perceptions with precision

while ordering can be regarded as more reliable)

17

Chapter 13

Chapter 13

Metric scaling

• With metric variables, one might apply FA (or PCA) to reduce the dimensions and

obtain the scores which represent the coordinates. However, there is a difference

• Coordinates obtained from PCA and FA are the best representation of the original data

matrix in terms of variability

• Metric scaling coordinates ensure that the distance between two points is as close as

possible to the distance as measured in the proximity matrix

• Classical MultiDimensional Scaling technique (CMDS) also known as principal

coordinate analysis

• Decompositional approach (unique similarity matrix comparing all objects)

• The proximity or similarity matrix is obtained by applying the Euclidean distance on the

data matrix (or other distance measures as those for cluster analysis).

• The objective of CMDS is to extract a a n-dimensional configuration of points whose

distances dij* are as close as possible to the original distances dij according to the

following quadratic equation

p

i −1

(d − d )

i = 2 j =1

2

ij

*2

ij

18

Non-metric scaling

•

•

•

Ordinal variables (preference data)

coordinates are obtained through computational algorithms

Many procedures. The original method (Shepard-Kruskal) is as follows

•

•

•

•

•

The procedure can be extended to include a search for the optimal number of

dimensions n.

Other algorithms:

•

•

•

19

given a number of dimensions n, the p objects are represented through an arbitrary

initial set of coordinates

a function S is defined to measure how distant the current set of coordinates is

from the original ordering (monotonicity requirement)

using an iterative computer numerical algorithm the values that minimize S are

found

ALSCAL (SPSS & SAS)

Algorithms in the MDS procedure in SAS

INDSCAL

Chapter 13

Chapter 13

Goodness-of-fit and STRESS

• The STRESS measure (STandardized REsiduals Sum of

Squares) is a function of the original and derived

distances to evaluate the goodness-of-fit of a MDS

solution:

p −1

STRESS =

p

ˆ )2

(

d

−

d

ij ij

i =1 j =i +1

p −1

p

2

d

ij

i =1 j =i +1

• The smaller the stress function, the closer are the derived

distances to the original ones

20

STRESS and number of dimensions

• The STRESS value decreases as the number of dimensions increases

• The number of dimensions can be evaluated through a scree diagram

of STRESS against the number of dimensions (as for FA, PCA or cluster

analysis) where the optimal number corresponds to an elbow

• The preferred number of dimensions is usually two or three which

allows for graphical examination

• The search usually goes from one to five dimensions

• Identification of the optimal number within the metric and non-metric

iterative algorithm

• An additional step evaluates the STRESS function

• The algorithm stops when the addition of a further dimension does not reduce

the STRESS value to a perceptible extent

• With two dimensions a STRESS value below 0.05 is generally

considered to be satisfactory.

21

Chapter 13

Chapter 13

Labelling dimensions

• Interpretable dimensions (attaching a meaning to coordinates)

enhance the use of MDS maps (e.g. new product development)

• Interpretation may be difficult

• Compositional approaches (or attribute ratings are otherwise

available) allow for more objective methods based on the relative

weight of each attribute (something similar to factor loadings)

22

Chapter 13

Ideal points

• Objective: position ideal points (for each subject) and the

actual brand evaluations (the objects) within the same

map

• Ideal point: set of coordinates which represents the

stated optimal combination of attributes (under the

compositional approach)

• If no precise statement is made by the subject it is still

possible to locate the ideal point

• Indirect positioning of ideal points is based on a

procedure which ensures that distances of the objects

from a subject’s ideal point reflect the preference

ordering as much as possible

23

Chapter 13

Internal vs. external preference mapping

• Internal Preference Mapping (IPM)

• the proximity matrix for the objects is based on evaluations from consumers. The final map

shows:

products as they are perceived by the consumers

consumers according to their preferences.

• External data (i.e. objective measures or expert evaluations) can be used to interpret the

dimensions but not to draw the map

• External Preference Mapping (EPM)

• the proximity matrix contains objective (analytic) measures of product characteristics (or

evaluations from expert panels)

• The maps contain information external to the set of consumers which provide their evaluation

of the products

• The final map shows

products as they are evaluated by the external source

consumers according to their preferences

24

Chapter 13

Internal preference mapping

• The ideal point (or vector) for each subject is estimated from the

preference orderings through unfolding

• Example

• four brands (A, B, C and D) are evaluated.

• Consumer one states a preference ordering as D, B, C and A, where D is the

most preferred brand

• Consumer two states the ordering C, B, D, A

• The ideal point for Consumer one will be closer to D and far away from A,

while for Consumer two the ideal point will probably be still far away from A

but closer to C than to D.

• The distance of the ideal point from the objects in the product space should

reflect as much as possible the ordering of the consumer preferences

25

Chapter 13

IPM and unfolding

• The ideal product is not necessarily a precise point in the

preference map but could be represented as a line (or an arrow)

going from the least preferred objects towards the most preferred

ones

• Unfolding approach

• Decompose all preference orderings for a given set of objects (products) so

that the same products can be represented in a lower dimensional space

• Once the products are positioned on the preference map it is possible to see

where the subjects (consumers) are positioned

• While the dimensions reflect some product characteristics that are the same

for all consumers each consumer attaches a different weight to those

dimensions

• Consumers have different ideal points because they place a different weight

on the dimensions

26

External preference mapping

• EPM follows a different philosophy from IPM

– It strictly requires the use of perceptual (metric) data

– Evaluations of the product characteristics are on a measurement scale rather than their

simple ordering

– Measurements are usually based on analytic or objective evaluations or expert

evaluations (external to the set of consumers which provide their product evaluation).

• The input matrix contains the (quantitative) measurements of all

attributes for each product.

• A data reduction techniques (usually PCA) allows one to attach a

set of coordinates (the principal component scores) to each of the

products

• The principal components define the dimensions of the map and

can be interpreted (labelled) through the component loadings.

• An algorithm (e.g. PREFMAP) allows one to elicit the position of

subjects (consumers) in the map.

27

Chapter 13

Chapter 13

IPM or EPM?

• Both approaches can be applied to the same data set but they reflect

different philosophies

– a consumer very much likes red full-bodied wine and white sweet and sparkling

wine

– IPM: these two products share similar preferences and will be positioned next to

each other

– EPM: the product characteristics are very different they will look distant on the

perceptual map.

• The choice between IPM and EPM is mainly related to the choice of

prioritizing either the preferences of the subjects (IPM) or the product

characteristics (EPM).

• When many dimensions are chosen the two approaches produce

similar results but with reduced dimensions discrepancies are likely to

emerge

28

IPM vs. EPM

• IPM is better when

• Perceptual data are inadequate to reflect preferences as it is not necessarily

true that the combination of the product attributes is an adequate

representation for the product

• Physical attributes as they are perceived by the consumer are

processed into a number of perceived benefits and these benefits

are then translated into preferences

• Thus the relative weight of the physical attributes as compared to

the abstraction process could drive the choice between IPM and

EPM.

• For those goods where the cognitive evaluation is mainly based on

the objective attributes EPM seems to be preferable

• Goods where the connection between perceptions and

preferences is not so natural (and affective processes play a major

role) are better analyzed with IPM

29

Chapter 13

Chapter 13

MDS in SPSS – the data

• MDS data set

• Fifty individuals (the subjects or consumers) were asked to rank

ten sports (the objects or products) according to their preference

• a panel of expert sport journalists provided an evaluation of the

attributes of each sport (the product characteristics) in terms of

strategy, suspense, physicality and dynamicity

• The final data set (MDS.sav) has one row for each sport and one

column for each consumer plus four columns for the sports’

attributes

30

Chapter 13

The MDS data set

Ratings by consumers

31

Evaluations of product

characteristics by experts

Chapter 13

IPM & unfolding

32

Chapter 13

Unfolding

Proximities are defined

from the subjects’

preference rankings

This nominal variable

provides the labels for

the objects (sports)

When measures for the

same set of objects are

provided by different

sources (e.g. different

groups/scenarios) –

data should be stacked

Defines model

options

Allows to

place

restrictions

33

Defines

options for the

algorithm

Choose

plots

Displays and saves

additional output

Chapter 13

Unfolding options

Select identity

as data come

from a single

source

Rankings are

dissimilarities

and ordinal

data

Number of

dimensions to

be explored

34

Chapter 13

Options

Convergence criterion for

the STRESS function

Choose the

starting

configuration

The penalty term helps avoid

degenerative solutions (where

points can hardly be distinguished

from each other).

The weight of the penalty term

increases as the strength

becomes smaller.

When the penalty range is zero, no

correction is made to the Stress-I

criterion, while larger range values

lead to solutions where the

variability of the transformed

proximities is higher

35

Plots

The final

common

space shows

subjects and

objects on the

same plot

36

Chapter 13

Applies different

colors or markets to

different objects

Chapter 13

Outputs

Output tables can be

selected here

Output coordinates

(distances) can be saved

into a new file

37

Unfolding output

Chapter 13

Measures

Iterations

Final Function Value

Function Value

Parts

Badness of Fit

Goodness of Fit

Variation

Coefficients

Degeneracy Indices

38

Stress Part

Penalty Part

Normalized Stress

Kruskal’s Stress-I

Kruskal’s Stress-II

Young’s S-Stress-I

Young’s S-Stress-II

Dispersion Accounted For

Variance Accounted For

Recovered Preference

Orders

Spearman’s Rho

Kendall’s Tau-b

Variation Proximities

Variation Transformed

Proximities

Variation Distances

Sum-of-Squares of

DeSarbo’s

Intermixedness Indices

Shepard’s Rough

Nondegeneracy Index

992

.3835645

.0410912

3.5803705

.0016885

.0410908

.1905153

.0720164

.1781156

.9983115

.9666225

.8471837

.8617494

.7273984

.5043544

.3322572

.5071630

.4694185

.5609796

The final STRESS-I value of 0.04 is acceptable.

Other measures of “badness-of-fit” and “goodness-of-fit” are

provided and confirm that the results are acceptable.

The variation coefficient of the transformed proximities

can be used to check for the risk of degenerated solutions

(points are too close to each other). In this case, the

variation coefficient of the transformed proximities is 0.33 as

compared to the 0.50 of the original ones, which means that

most of the variability is retained after transformation.

Furthermore, the distances show a variability which is more

or less equal to the original one, indicating that the points in

space should be scattered enough to reflect the initial

distances.

The DeSarbo’s Intermixedness index and the Shepard’s

RNI also provide warning signals for degenerated solutions:

the former should be as close to zero as possible and the

latter as close to one as possible. There are no strong

signals for a degenerated solution

One may wish to try different parameters for the penalty

term to see whether these indicators improve.

Chapter 13

Plots

Plot of objects

39

Plot of subjects

Chapter 13

Joint plot

According to the sample, basketball,

baseball and cricket share

similarities in subjects’ perceptions

and so do American football, motor

sports and ice hockey.

A third “cluster” is provided by

handball, waterpolo and volleyball,

while football seems to be

equidistant from all other sports.

Consumers are also grouped in

clusters according to their

preferences and the joint

representation allows one to show

not only which sports (products) are

closer to the preferences of different

segments, but also which sports

need to be repositioned to attract

more public, like the cluster with

volleyball, waterpolo and handball.

40

Chapter 13

Repositioning

• If one can attach a meaning to dimensions one and two it

becomes possible to understand what characteristics of

the products should be changed

• A method to obtain an interpretation of the coordinates

consists in looking at the correlations betweens the

coordinates of the sports and the object characteristics

that can be measured objectively or through the

evaluation of expert panellists.

• The algorithm has created an output file coord.sav which

contains the two coordinates for each sport and

consumer and can be used to obtain the bivariate

correlations

41

Chapter 13

Labelling dimensions

DIM_1 DIM_2 Strategy Suspense Physicity Dinamicity

DIM_1

1.000

0.000

-0.839

-0.167

0.130

0.362

DIM_2

0.000

1.000

0.338

-0.180

0.330

0.168

Sports on the left side of the graph are likely to be more strategic, while those

on the right are more dynamic.

Considering the second dimension, as values move towards the top, the sports

are expected to become more physical and strategic, while negative values

seem to indicate a lack of suspense.

Ideally, those who want to bring people closer to volleyball, water-polo or

handball should try and move the points toward the top left area, thus trying to

persuade “consumers” that these sports are more strategic (especially),

dynamic and physical than currently thought.

42

Field Work

Field work

Complete the questions on NYU Home under our Class website. Submit online by next week!

STATISTICAL MEASUREMENTS,

ANALYSIS AND RESEARCH

Instructor: Amreeta Choudhury

M4:2B Lecture 11 Part 2

Session Objectives

• Discuss Preferences, perceptions and multidimensional scaling

• Run multidimensional scaling

• Summarize principles and applications of correspondence analysis

• Theory and techniques of correspondence analysis

• Running correspondence analysis

• Use in Marketing including the following:

– Translate preference orderings and consumer perceptions toward products

(objects) into graphical representations, Perceptual maps, etc.

• Introduce Final Project

Chapter 14

Correspondence analysis

• Multivariate statistical technique which looks into the association of

two or more categorical variables and display them jointly on a

bivariate graph

• It can be used to apply multidimensional scaling to categorical

variable.

3

Chapter 14

Correspondence analysis

and data reduction techniques

• Factor and principal component analyses are only applied to metric (interval or

ratio) quantitative variables

• Traditional multidimensional scaling deals with non-metric preference and

perceptual data when those are on an ordinal scale

• Correspondence analysis allows data reduction (and graphical representation of

dissimilarities) on non-metric nominal (categorical) variables

• The issue with categorical (non-ordinal) variables is how to measure distances

between two objects: Correspondence analysis exploits contingency tables and

association measures

4

Chapter 14

Example (Trust data)

• Do consumers with different jobs (q55) show preferences for some

specific type of chicken (q6)?

Correspondence Table

In a typical week, what type of fresh or frozen chicken do you buy for

your household’s home consumption?

If employed, what is your

‘Value’

‘Standard’

‘Organic’

‘Luxury’

occupation?

chicken

chicken

chicken

chicken

Active Margin

I am not employed

17

50

10

17

94

Non manual employee

11

74

14

28

127

Manual employee

6

19

4

8

37

Executive

0

7

6

14

27

Self employed

1

18

7

3

29

professional

Farmer / agricultural

1

1

1

0

3

worker

Employer / Entrepreneur

0

4

2

3

9

Other

11

31

1

1

44

Active Margin

47

204

45

74

370

5

Chapter 14

Independence

•

•

•

6

If the two characters are independent then the number in the cells

of the table should simply depend on the row and column totals

(lecture 9)

Measure the distance between the expected frequency in each cell

and the actual (observed) frequency

Compute a statistic (the Chi-square statistic) which allows one to

test whether the difference between the expected and actual value

is statistically significant

Chapter 14

Reducing the number of dimensions

• The elements composing the Chi-square statistic are

standardized metric values, one for each of the cells

• They become larger as the association between two

specific characters increases

• These elements can be interpreted as a metric measure

of distance

• The resulting matrix is similar to a covariance matrix

• A method similar to principal component analysis can be

applied to this matrix to reduce the number of

dimensions

7

Chapter 14

Coordinates

• The principal component scores provide standardized values that can

be used as coordinates

• One may apply the same data reduction technique

• first by rows (synthesizing occupation as a function of types of chicken)

• then by column (synthesizing types of chicken as a function of occupation)

• The first two components for each application generate a bivariate

plot which shows both the occupation and the type of chicken in the

same space

8

Chapter 14

Output from

Correspondence Analysis

Unemployed

are closer to

“Value” chicken

9

Executives prefer

“Luxury” chicken

Applications

• It is possible to represent on the same graph consumer

preferences for different brands and characteristics of a

specific product (e.g. car brands together with colour,

power, size, etc.)

• This allows one to explore brand choice in relation to

characteristics opening the way to product modifications

and innovations to meet consumer preferences

• Correspondence analysis is particularly useful when the

variables have many categories

• The application to metric (continuous) data is not ruled

out but data need to be categorized first

10

Chapter 14

Summary

• Correspondence analysis is a compositional technique which starts

from a set of product attributes to portrait the overall preference for

a brand

• This technique is very similar to PCA and can be employed for data

reduction purposes or to plot perceptual maps

• Because of the way it is constructed correspondence analysis can be

applied to either the row or the columns of the data matrix

• For example if rows represent brands and columns are different

attributes:

1. By applying the method by rows one obtains the coordinates for the brands

2. The application by columns allows one to represent the attributes in the same

graph

11

Chapter 14

Chapter 14

Steps to run correspondence analysis

•

•

•

•

•

12

Represent the data in a contingency table

Translate the frequencies of the contingency table into a matrix of

metric (continuous) distances through a set of Chi-square

association measures on the row and column profiles

Extract the dimensions (in a similar fashion to PCA)

Evaluate the explanatory power of the selected number of

dimensions

Plot row and column objects in the same co-ordinate space

Chapter 14

The frequency table

Categorical variable X (k categories)

Categorical variable Y (l categories)

x1

x2

…

xi

…

xk

y1 y2 …

f11 f12

f21 f22

yj

f1j

f2j

fi1

fij

fil

fk1 fj2

f01 f02

fkj

f0j

fkl

f0l

Column profile

13

…

yl

f1l

f2l

Row profile

f10

f20

…

fi0

…

fkl

1

Column masses

Row masses

Interpretation of coordinates

• The categories of the x variable can be seen as different

coordinates for the points identified by the y variable

• The categories of the y variable can be seen as different

coordinates for the points identified by the x variable

• Thus it is possible to represent the x and y categories as

points in space, imposing (as in multidimensional scaling)

that they respect some distance measure

14

Chapter 14

Representations

• Take the row profile (the categories of x) and plot the

categories in a bi-dimensional graph, using the

categories of y to define the distances

• This allows one to compare nominal categories within

the same variable: those categories of x which show

similar levels of association with a given category of y

can be considered as closer than those with very

different levels of association with the same category of

y

• The same procedure is carried out transposing the table

which means that the categories of y can be represented

using the categories of x to define the distances

15

Chapter 14

Computing the distances

•

When the coordinates are defined simultaneously for the categories of x and

y the Chi-square value can be computed for each cell as follows

•

Obtain the expected table frequencies

•

Where nij and fij are the absolute and relative frequencies, respectively, ni0 and n0j (or fi0 and f0j)

are the marginal totals for row i and column j (the row masses and column masses) respectively

and n00 is the sample size (hence the total relative frequency f00 equals one)

•

f =

n n

=

f f

= fi 0 f 0 j

i0

0j

i0

0j

*

ij now be computed for each cell (i,j)

The Chi-square value can

00

00

ij2 =

16

Chapter 14

n

( f ij − f ij* ) 2

f ij*

f

These are the quadratic distances

between category i and category j

of the x variable

The distance matrix

• The matrix 2 measures all of the associations between the

categories of the first variable and those of the second one.

• A generalization of the multivariate case (MCA is possible by

stacking the matrix

• Stacking: compose a large matrix by blocks, where each block is the

contingency matrix for two variables (all possible associations are taken into

consideration)

• The stacked matrix is referred to as the Burt Table

• To obtain similarity values from the 2 matrix:

• compute the square root of the elemental Chi-square values

• use the the appropriate sign (the sign of the difference fij –fij*)

• large positive values correspond to strongly associated categories

• large negative values identify those categories where the association is strong

but negative indicating dissimilarity

17

Chapter 14

Estimation

• The resulting matrix D contains metric and continuous similarity

data

• It is possible to apply PCA to translate such a matrix into

coordinates for each of the categories first those of x then those of

y

• Before PCA can be applied some normalization is required so that

the input matrix becomes similar to a correlation matrix

• The use of the square root of the row masses (columns) for

normalizing the values in D represents the key difference from PCA

• The rest of the estimation process follows the results of the PCA

• As for PCA eigenvalues are computed, one for each dimension,

which can be used to evaluate the proportion of dissimilarity

maintained by that dimension

18

Chapter 14

Inertia

• Inertia is a measure of association between two categorical

variables based on the Chi-squared statistic.

• In correspondence analysis the proportion of inertia explained by

each of the dimensions can be regarded as a measure of goodnessof-fit because the effectiveness of correspondence analysis

depends on the degree of association between x and y

• Total inertia

– is a measure of the overall association between x and y

– is equal to the sum of the eigenvalues

– corresponds to the Chi-square value divided by the number of observations

– A total inertia above 0.20 is expected for adequate representations

• Inertia values can be computed for each of the dimensions and

represent the contribution of that dimension to the association

(Chi-square) between the two variables

19

Chapter 14

Chapter 14

SPSS example

• EFS data set:

• economic position of the

household reference

person (a093)

• type of tenure (a121)

• Their Pearson Chi-square

value is 274, which means

significant association at

the 99.9% confidence

level)

20

Chapter 14

Analysis

Define the range, i.e. the categories for each

variable that enter the analysis

Some categories

can be indicated as

supplementary:

they appear in the

graphical

representation, but

do not influence the

actual estimation of

the scores

21

Chapter 14

Model options

Choose the number of

dimensions to be

retained

Choice of

distance measure

Standardization (only for

Euclidean distance)

Normalization

Which variable

should be

privileged?

22

Chapter 14

Number of dimensions

• The maximum number of dimensions for the analysis is

equal to

• the number of rows minus one, or

• the number of columns minus one (whichever the smaller)

• In our example, the maximum number of dimensions

would be five which reduces to four due to missing values

in one row category.

• As shown later in this section one may then choose to

graphically represent only a sub-set of the extracted

dimensions (usually two or three) to make interpretation

easier

23

Chapter 14

Distance measure

• Chi-square distance (as discussed earlier)

• Euclidean distance

• uses the square root of the sum of squared differences between pairs of rows

and pairs of columns

• this also requires one to choose a method for centering the data (see the SPSS

manual for details)

• For this example standard correspondence analysis (with the Chisquare distance) does not require a standardization method.

24

Normalization method

• Defines how correspondence analysis is run: whether to give priority to comparisons

between the categories for x (row) or those for y (columns)

• This choice influence the way distances are summarized by the first dimensions

• Row principal normalization: the Euclidean distances in the final bivariate plot of x

and y are as close as possible to the Chi-square distances between the rows, that is

the categories of x

• The opposite is valid for the column principal method

• Symmetrical normalization: the distances on the graph resemble as much as

possible distances for both x and y by spreading the total inertia symmetrically

• Principal normalization: inertia is first spread over the scores for x, then y

• Weighted normalization: defines a weighting value between minus one and plus one

where minus one is the column principal zero is symmetrical and plus one is the row

principal

• EFS example: the row principal method is more appropriate as it is more relevant to

see how differences in socio-economic conditions impact on the tenure type than it

is by looking at distances between tenure types.

25

Chapter 14

Chapter 14

Additional statistics

Although CA is a

nonparametric method,

it is possible to compute

standard deviations and

correlations under the

assumption of

multinomial distribution

of the cell frequencies,

(when data are obtained

as a random sample

from a normally

distributed population)

26

Allows one to order the categories of x and y using scores

obtained from CA

E.g. the tenure types and the socio-economic conditions

might follow some ordering but cannot be defined with

sufficient precision to consider these variables as ordinal.

One can use the scores in the first dimension (or the first

two) to order the categories and produce a permutated

correspondence table.

Chapter 14

Plots

Three graphs:

•Biplot (both x & y)

• x only (rows)

• y only (columns)

One usually chooses to

represent only the first

two or three of the

extracted dimensions

27

Chapter 14

Output

The first dimensin explains 85%, the first two 93%

of total inertia. However, note that total inertia

does not correspond to total variability, but to the

variability of the extracted dimensions

The SV is the

square root of inertia

(the eigenvalue)

Summary

Proportion of Inertia

Dimension

1

2

3

4

Total

Singular

Value

.669

.209

.173

.072

a. 24 degrees of freedom

Usually a value of

total inertia above

0.2 is regarded as

acceptable

28

Inertia

.447

.044

.030

.005

.526

Chi Square

231.402

Sig.

.000a

Accounted for

.850

.083

.057

.010

1.000

The Chi-square stat

suggests strong and

significant association

Cumulative

.850

.933

.990

1.000

1.000

Confidence Singular Value

Standard

Deviation

.031

.055

.055

.053

2

.094

Correlation

3

-.032

.011

4

-.022

.081

-.042

These precision measures

are based on the

multinomial distribution

assumption

Row scores

Chapter 14

Score in Dimension

These categories have a higher relevance because

they are more important categories in the original

correspondence table. These two categories

(especially retirement) strongly contribute to

Overview Row Points

explaining the first dimensionContribution

The mass column shows

the relative weight of each

category on the sample

Economic position of

Household Reference

Person

Self-employed

Fulltime employee

Pt employee

Unemployed

Worka related govt train

prog

Ret unoc over min ni age

Active Total

b

Of Point to Inertia of Dimension

1

2

3

.016

.001

.496

.334

.030

.027

.010

.295

.318

.001

.622

.157

4

.407

.071

.300

.202

1

.290

.984

.156

.013

.

.000

.000

.000

.000

.

.

.

.

.

.288

.526

.639

1.000

.052

1.000

.002

1.000

.020

1.000

.992

.008

.000

.000

1.000

Mass

.080

.539

.077

.018

1

.296

.527

-.239

-.154

2

.025

.049

-.409

-1.223

3

.433

-.039

-.352

.509

4

-.164

.026

-.143

.241

Inertia

.024

.152

.028

.033

.000

.

.

.

.

.286

1.000

-.999

.089

.015

.019

Of Dimension to Inertia of Point

2

3

4

.002

.620

.089

.008

.005

.002

.453

.336

.055

.814

.141

.032

The second dimension is

characterized by unemployed and

part-time employees

Total

1.000

1.000

1.000

1.000

a. Supplementary point

b. Row Principal normalization

Scores are computed for each

category but the supplemental one,

provided there are no missing data

Scores are the coordinates for the

map

29

Shows how total inertia has been

distributed across rows (similar to

communalities)

Chapter 14

Column scores

• The same exercise is carried out on columns, however the

row principal method does not normalize by column

Overview Column Pointsb

Score in Dimension

Tenure – type

Local Authority rented

unfurnished

Housing association

Other rented unfurnished

Rented furnished

Owned with mortgage

Owned by rental

purchase

Owned outright

Rent freea

Active Total

Mass

1

2

3

4

.098

-.699

-1.993

.051

1.106

.039

.048

.388

.066

.050

.032

.457

-.781

.487

.531

.971

-1.263

-2.023

-1.098

.371

2.821

-2.190

-2.270

.233

-1.273

.891

-4.585

.133

.039

.022

.014

.196

.040

.012

.009

.431

.002

1.179

1.120

-1.287

5.002

.002

.295

.009

1.000

-1.244

-.957

.819

-1.039

-.382

-2.996

.018

-3.705

.214

.007

.526

a. Supplementary point

b. Row Principal normalization

30

Contribution

Inertia

1

Of Point to Inertia of Dimension

2

3

Of Dimension to Inertia of Point

2

3

4

4

1

.000

.120

.548

.436

.000

.016

1.000

.105

.205

.038

.063

.524

.240

.164

.025

.107

.040

.669

.008

.462

.245

.284

.982

.118

.413

.119

.014

.405

.333

.349

.004

.014

.010

.248

.000

1.000

1.000

1.000

1.000

.003

.003

.004

.057

.725

.064

.058

.153

1.000

.457

.000

1.000

.198

.000

1.000

.043

.000

1.000

.000

.000

1.000

.954

.512

.040

.059

.006

.338

.000

.090

1.000

1.000

By column the first dimension is especially related to the

“owned by mortgage” and “owned outright” categories

Total

Chapter 14

Bi-plot

Employed individuals are

closer to owned

accommodations

Retired individuals are

also close to owned

accommodations

Part-time employees and

unemployed individuals are closer

to rented accommodations and

other forms of accommodations

31

Chapter 14

Multiple Correspondence Analysis(MCA)

When all variables are multiple

nominal, then optimal scaling applies

MCA

32

Chapter 14

Plot with 3 variables

The analysis

now also

includes the

government

office region

33

Field Work

Field work

Complete the questions on NYU Home under our Class website. Submit online by next week!

Why Work with Us

Top Quality and Well-Researched Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.

Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.

Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.

24/7 Customer Support

Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.