By Agostino Di Ciaccio, Mauro Coli, José Miguel Angulo Ibáñez

The subject of the assembly was once “Statistical tools for the research of huge Data-Sets”. lately there was expanding curiosity during this topic; in truth an enormous volume of data is usually to be had yet commonplace statistical thoughts should not like minded to dealing with this type of facts. The convention serves as an immense assembly aspect for eu researchers engaged on this subject and a few eu statistical societies participated within the association of the development. The e-book contains forty five papers from a range of the 156 papers authorized for presentation and mentioned on the convention on “Advanced Statistical equipment for the research of huge Data-sets.”

**Additional info for Advanced Statistical Methods for the Analysis of Large Data-Sets (Studies in Theoretical and Applied Statistics / Selected Papers of the Statistical Societies)**

**Sample text**

Ys = s D s ; Âs D Âs ; G D c/ D c C hAc I si C hB c I Âs i C hC c Âs I si (10) where f c ; Ac ; B c ; C c gcD1;:::;C are the estimated regression functions given by the generalized least squares criterion. If n data points have been recorded, the clusterwise linear regression algorithm finds simultaneously an optimal partition of the n points and the regression models f c ; Ac ; B c ; C c gcD1;:::;C associated to each cluster, which optimize the criterion: 28 E. Romano and R. s/ . c C hAc I si C hB c I Âs i C hC c Âs I 2 s i/ (11) cD1 i 2Pc that is minimizing the sum of the squares errors S SEc over the C clusters.

We propose to solve this problem evaluating for each cluster, kriging on the locations of the grid in order to obtain the best representative kriging predictor. The prototype is the best predictor in terms of the best spatio-functional fitting (5) among the set of the estimated prototype on different spatial locations. Once we have estimated the prototypes we allocate each new curve to the cluster according to the following allocation function: ÄD ∫ 7 ! fi g; sc / D 1 ˛ Z V. t// dt (5) where ˛ is the kriging coefficient or weight such that js˛ sc j Š h where h D jsi sc j.

Focussing on the last plot, at least two features need to be discussed. First, note the clear vertical shift between the orange and the black line: this points out the Joint Clustering and Alignment of Functional Data 39 presence of a non-negligible phase variability within the original data and thus the necessity of aligning the data before undertaking any further analysis. Second, once decided that alignment is needed, note the absence in the orange line of an evident improvement in the performance when three clusters are used in place of two: this suggests that k D 2 is the correct number of clusters.