**Navigation**

**Contact us**

- Scientific Leader
- tel: +33 3 20 43 68 76

- Team Assistant
- tel: +33 3 59 57 78 45

**Research Organizations**

**Current Collaborations**

**Related Inria teams**

*© 2016-2018 Modal-Team. All rights reserved.*

**Navigation**

**Contact us**

- Scientific Leader
- tel: +33 3 20 43 68 76

- Team Assistant
- tel: +33 3 59 57 78 45

**Research Organizations**

**Current Collaborations**

**Related Inria teams**

*© 2016-2018 Modal-Team. All rights reserved.*

mixtcomp

The demonstrator is located at https://modal-research.lille.inria.fr/BigStat. To work on your own data, you first need to create an account. Its use is pretty straightforward: you zip three files, wait for the processing, then get back a zip file containing a `.RData`

file that is readable by R. Note that this platform is currently in beta.

The three files to be uploaded in a zip file depend on the mode:

learn | predict |
---|---|

data.csv | data.csv |

descriptor.csv | descriptor.csv |

param.ini | output.RData |

The demonstrator has two modes of operation for MixtComp, “learn” and “predict”.

- In learning, the parameters of the mixtures are estimated.
- In prediction only the missing values (including latent class) are estimated, using parameter estimated from a previous learning.

In this section you will find a description of the syntax, and in a following section some examples of test files.

The descriptor file contains on the first line the names of the variables, and on the second line the name of the models to be applied. Currently three models can be applied:

- Categorical_pjk
- Gaussian_sjk
- Poisson_k

If no information on the latent class is provided, the code runs in unsupervised mode. However, semi / fully supervised computations can be carried out by providing a `z_class`

variable. In that case, its model must be `LatentClass`

.

- descriptor.csv
categorical1;categorical2;categorical3;gaussian1;gaussian2;gaussian3;poisson1;poisson2;poisson3;z_class Categorical_pjk;Categorical_pjk;Categorical_pjk;Gaussian_sjk;Gaussian_sjk;Gaussian_sjk;Poisson_k;Poisson_k;Poisson_k;LatentClass

- categorical data must be coded as contiguous integers, with the first modality coded as 1

- data.csv
categorical1;categorical2;categorical3;gaussian1;gaussian2;gaussian3;poisson1;poisson2;poisson3;z_class 4;1;4;-0.3580979364;-0.3021767542;-0.1075462398;16;4;9;{2} 2;1;4;-0.3365602931;-0.1935100901;-0.2883606085;13;4;11;? 2;1;2;-0.2257433203;-0.3339290504;0.0209779879;21;7;14;2

Categorical_pjk | Gaussian_sjk | Poisson_k | LatentClass | |
---|---|---|---|---|

$?$ (completely missing) | X | X | X | X |

$\{a,b,c\}$ (finite number of values authorized) | X | X | ||

$[a:b]$ (bounded interval) | X | |||

$[-inf:b]$ (semi-bounded interval) | X | |||

$[a:+inf]$ (semi-bounded interval) | X |

Will contain all the runtime parameters. At the moment, only contains the number of classes asked, as a `nbCluster`

parameter.

- param.ini
nbCluster = 2

A file obtained as a result of a “learn” run, that contains a description of the estimated parameters used for the prediction. This file is in binary format and does not need not be edited by the user.

The syntax of the file should be respected, with `;`

delimiters, no quotes for strings.

- Here is an archive containing three files that you can use to test the “learn” mode of the demonstrator: datalearn.zip. It contains an heterogeneous set of models (multinomial, Poisson and Gaussian).
- Here is an archive containing three files that you can use to test the “predict” mode of the demonstrator: datapredict.zip. The parameters in the output.RData file have been estimated from the learning set above.

The result is downloaded as an RData file containing a named list res. There is a hierarchy of elements. For example, if you want to access the parameters of the categorical1 data, you would do it as `res$variable$param$categorical1$stat`

where you will find a table of the form:

expectation | q 2.5% | q 97.5% | |
---|---|---|---|

k: 1, modality: 1 | 0.3 | 0.25 | 0.35 |

k: 1, modality: 2 | 0.7 | 0.69 | 0.71 |

k: 2, modality: 1 | 0.6 | 0.54 | 0.63 |

k: 2, modality: 2 | 0.4 | 0.35 | 0.41 |

If you look at the parameters for a gaussian variable, for example at `res$variable$param$gaussian1$stat`

, you will find a table of the form:

expectation | q 2.5% | q 97.5% | |
---|---|---|---|

k: 1, mean | 3. | 2.9 | 3.1 |

k: 1, sd | 0.7 | 0.69 | 0.75 |

k: 2, mean | 4. | 3.95 | 4.1 |

k: 2, sd | 0.4 | 0.25 | 0.56 |

Which contains the various parameters. The expectation and quantiles correspond to the estimation performed during the SEM algorithm. The row labels should be self explanatory for the various types of models.

res strategy nbTrialInInit nbBurnInIter nbIter nbGibbsBurnInIter nbGibbsIter mixture nbCluster nbFreeParameters lnObservedLikelihood lnSemiCompletedLikelihood lnCompletedLikelihood BIC ICL runTime nbSample warnLog variable data z_class completed !!! <- imputed classes stat !!! <- a posteriori distribution of class for each individual (= p(z_i / x_i)) categorical1 completed stat categorical2, etc ... param z_class stat !!! <- model proportions and quantiles log categorical1 stat log categorical2, etc ...

Note that the `z_class`

variable contains all the information pertaining to the latent classes:

`res$variable$data$sample$completed`

contains the imputation for the class, $\hat{z}_i$`res$variable$data$sample$stat`

contains the estimated a posteriori probabilities, $\hat{t}_{ik}$`res$variable$param$z_class$stat`

contains the proportions, $\hat{\pi}_k$

mixtcomp.txt · Last modified: 2015/07/03 13:12 by kubicki