On this Picostat.com statistics page, you will find information about the milk data set which pertains to Daudin's Milk Composition Data. The milk data set is found in the robustbase R package. You can load the milk data set in R by issuing the following command at the console data("milk"). This will load the data into a variable called milk. If R says the milk data set is not found, you can try installing the package by issuing this command install.packages("robustbase") and then attempt to reload the data. If you need to download R, you can go to the R project website. You can download a CSV (comma separated values) version of the milk R data set. The size of this file is about 3,647 bytes.
Daudin's Milk Composition Data
Daudin et al.(1988) give 8 readings on the composition of 86
containers of milk. They speak about 85 observations, but this
can be explained with the fact that observations 63 and 64 are
identical (as noted by Rocke (1996)).
The data set was used for analysing the stability of principal
component analysis by the bootstrap method. In the same context, but
using high breakdown point robust PCA, these data were analysed by
Todorov et al. (1994). Atkinson (1994) used these data for ilustration
of the forward search algorithm for identifying of multiple outliers.
A data frame with 86 observations on the following 8 variables, all
but the first measure units in grams / liter.
cheese dry substance measured in the factory
cheese dry substance measured in the laboratory
milk dry substance
Daudin, J.J. Duby, C. and Trecourt, P. (1988)
Stability of Principal Component Analysis Studied by the Bootstrap Method;
Statistics 19, 241–258.
Todorov, V., Neyko, N., Neytchev, P. (1994)
Stability of High Breakdown Point Robust PCA,
in Short Communications, COMPSTAT'94; Physica Verlag, Heidelberg.
Atkinson, A.C. (1994)
Fast Very Robust Methods for the Detection of Multiple Outliers.
J. Amer. Statist. Assoc. 89 1329–1339.
Rocke, D. M. and Woodruff, D. L. (1996)
Identification of Outliers in Multivariate Data;
J. Amer. Statist. Assoc. 91 (435), 1047–1061.
(c.milk <- covMcd(milk))
summarizeRobWeights(c.milk $ mcd.wt)# 19..20 outliers
umilk <- unique(milk) # dropping obs.64 (== obs.63)
summary(cumilk <- covMcd(umilk, nsamp = "deterministic")) # 20 outliers
Dataset imported from https://www.r-project.org.