# Conferências

#### C1 **- Additive Models for Quantile Regression: Model Selection and Confidence Bands**

**- Additive Models for Quantile Regression: Model Selection and Confidence Bands**

Roger Koenker (U. Illinois)

Resumo: *We describe some recent development of nonparametric methods for estimating conditional quantile functions using additive models with total variation roughness penalties. We focus attention primarily on selection of smoothing parameters and on the construction of confidence bands for the nonparametric components. Both pointwise and uniform confidence bands are introduced; the uniform bands are based on the Hotelling (1939) tube approach. Some simulation evidence is presented to evaluate finite sample performance and the methods are also illustrated with an application to modeling childhood malnutrition in India.*

#### C2* - Life is not normal! GAMLSS modelling in practice*

Mikis Stasinopoulos (*London Metropolitan U.*)

Resumo: *The Normal (or Gaussian) distribution has dominated statistical analysis in the 20th century. This lead to mathematically elegant "exact" solutions in statistical inference and to well known tests based on the Student's t and the Fisher's F distribution. Unfortunately this assumption does not hold in general, leading to "exact solutions applied to the wrong problem rather than approximate solutions applied to the right problem". Statisticians have tried to overcome the problem of normality and allow exible modelling with the introduction of the Generalised Linear Model (GLM), (Nelder and Wedderburn, 1972), the Generalised Additive Model (GAM) (Hastie and Tibshirani, 1990) and the Generalised Additive Models for Location, Scale and Shape (GAMLSS) (Rigby and Stasinopoulos, 2005). GAMLSS models not only the location and scale of the model but also the skewness and kurtosis as linear or/and smooth functions of explanatory variables. This talk will discuss:*

*• a brief history of GAMLSS*

*• the R implementation of the GAMLSS models (Stasinopoulos and Rigby, 2007)*

*• the involvement of the speaker with the the Department of Nutrition for Health and Development of the World Health Organisation and the construction of the worldwide standard growth centile curves (WHO, 2006, 2007, 2009)*

*• the current and future directions for the GAMLSS project.*

#### C3* - A New Class of Flexible Link Function with Application to Spatially Correlated Species Co-occurrence In Cape Floristic Region*

Dipak K. Dey (U. Connecticut)* *

*(Joint work with **Xun Jiang and Kent E. Holsinger)*

*Understanding the mechanisms that allow biological species to co-occur is of great interest to ecologists. Here we investigate the factors that influence co-occurrence of members of the genus Protea in the Cape Floristic Region of southwestern Africa, a global hotspot of biodiversity. Due to the binomial nature of ourresponse, a critical issue is to choose appropriate link functions for the regression model. In this paper we propose a new family of flexible link functions for modeling binomial response data. By introducing a power parameter into the cdf corresponding to a symmetric link function and its mirror reflection, greater flexibility in skewness can be achieved in both positive and negative directions. Through simulated data sets and analysis of the Protea co-occurrence data, we show that the proposed link function is quite flexible and performs better against link misspecification than existing standard link functions.*

**C4*** - Confidence Distributions and Confidence Likelihood - Distributional Inference without Priors*

*- Confidence Distributions and Confidence Likelihood - Distributional Inference without Priors*

Tore Schweder (U. Oslo)* *

Resumo:* Confidence distributions derived from the fiducial argument, or by probability transforming the (profile) likelihood function, provides confidence intervals and p-values. Confidence is an epistemic probability concept. It represents the degree of belief that follows from the data and the statistical model, but is less subjective than the Bayesian posterior since no prior is involved. Confidence distributions for vector parameters cannot in general be marginalized to a confidence distribution for a derived parameter by integration, as Fisher claimed for his fiducial distributions. The confidence curves (Birnbaum 1961) represents the confidence distributions and is useful in reporting and in obtaining a likelihood function related to its confidence distribution. Confidence distributions and confidence likelihood are useful for integrative statistical analysis and meta-analysis. In addition to presenting confidence distributions, the fiducial debate will be briefly reviewed, and comparisons will be made with Bayesian inference.*

#### C5 *- Bayesian Graphical Model Determination*

Peter J. Green (Bristol, UK and UTS, Australia)

Resumo:* The structure in a multivariate distribution is largely captured by the conditional independence relationships that hold among the variables, often represented graphically, an inferring these from data is an important step in understanding a complex stochastic system. Simultaneous inference about the conditional independence graph and parameters of the model is known as joint structural and quantitative learning in the machine learning literature: it is appealing to conduct this in Bayesian paradigm, but this can pose computational challenges, because of the huge size of the model space that is envolved, unless there are very few variables. After setting the scene, I will present some recent joint work with Alun Thomas (Utah), that exploits new results on perturbations to graphs that maintain decomposability and on enumeration of junction trees to construct a Markov chain sampler on junction trees that can be used to compute joint inference about structure and parameters in graphical models on quite a large scale.*

#### C6* - Fast Semi-parallel Regression Computations for Genetic Association Studies*

Paul Eilers (U. Rotterdam)

Resumo:* **The advent of non-expensive microarrays has made large scale genotyping of hundreds of thousands of SNPs (single nucleotide polymorphisms) feasible. This has led to a lot of activity in genome wide association studies (GWAS), to detect relationships between genetic variants and phenotypes. This has not been a sounding success, because associations have turned out to be extremely weak. The GWAS community hopes to counter this set-back by increasing sample sizes and measuring and imputing more SNPs. *

*The computations for GWAS are not complicated: basically a regression model is estimated for each SNP, including covariates like age, gender, height and weight. When population stratification is present, a popular correction method is to compute principal components of the correlation matrix between subjects (as determined from the genotypes) and to include them in the regression too. It is not unusual to have one to five million SNPs, a large proportion of them imputed. That means computing one to five million regression models. Typically grids of multiple computers are being used, to spread the computational load and to get decent run times. However, computing grids are expensive and hard to use. *

*The goal of my presentation is to show that dramatic speed-ups are possible by exploiting two ideas. One idea is to apply existing theory for updating regression equations. The covariates are fixed for each person and only the pattern of genotypes changes from one SNP to the other. So we can apply rank-one updates. The second idea is to organize the computations in such a way that they can be performed as simple matrix operations that handle thousands of SNPS at the same time. This is what I call semi-parallel, to set it apart from the common interpretation of parallel computation as distributed over multiple processors or computers. *

*The advantages of faster computation are even greater when we have to deal with mixed models. There is a growing interest in studying association between genotypes and longitudinal phenotypes. A mixed model is a natural candidate, but it typically takes a second or more to estimate it, for only one SNP. With millions of SNPs that would take a month on one computer. With semi-parallel algorithms accelerations of several orders of magnitude are possible.*

**C7 - ***A variable range Markov random field model to find SNPs dependence windows*

*A variable range Markov random field model to find SNPs dependence windows*

Florencia Leonardi (USP)

Resumo:*The SNPs (Single Nucleotide Polymorphisms) genotyping platforms are of great value for gene mapping of complex diseases bringing up, however, new issues related to data analysis. Although the high-density of these molecular markers enables the study of dependence patterns between loci over the genome, methods for this purpose are usually based on pairwise linkage disequilibrium analysis. We propose to model the SNPs data as an inhomogeneous variable range Markov random field on Z. In order to infere the range of dependence among SNPs we propose a penalized maximum likelihood criterion and we show the almost sure consistency of this method. By examining rheumatoid arthritis data from the Genetic Analysis Workshop (GAW16), we show the utility of this variable range model to find case-control associated dependence windows.*

**C8 - Cholesky Stochastic Volatility Models for High-Dimensional Time Series**

**Hedibert Lopes (U. Chicago)**

Resumo*:** **Multivariate time-varying volatility has many important applications in finance, including asset allocation and risk management. Estimating multivariate volatility, however, is not straightforward because of two major difficulties. The first difficulty is the curse of dimensionality. For p time series, there are p(p+1)/2 volatility and cross-correlation series. The second difficulty is that the conditional covariance matrix must be positive definite for all time points. This is not easy to maintain when the dimension is high. In order to simply maintain positive definiteness, we model the Cholesky root of the time varying p x p covariance matrix. Our approach is Bayesian and we propose prior distributions that allow us to search for simplifying structure without placing hard restrictions on the parameter space. Our modeling approach is chosen to allow for parallel computation and we show how to optimally distribute the computations across processors. We illustrate our approach by a number of real and synthetic examples, including a real application with 94 components (p=94) of the S&P 100 index. This is joint work with Robert McCulloch and Ruey Tsay.*

#### C9 - *Convergence of the empirical distribution of the aligned letter pairs in an optimal alignment of random sequences*

*Convergence of the empirical distribution of the aligned letter pairs in an optimal alignment of random sequences*

Heinrich Matzinger (Georgia Tech)

Resumo: *Let X and Y be two independent random sequences of length n. We consider their optimal alignments according to a scoring function S. We show that when the scoring function S is chosen at random then with probability 1, the frequency of the aligned letter pairs converges to a unique distribution as n goes to infinity. We also show some concentration of measure phenomena.*

#### C10 - Modelos Mistos Semiparamétricos Elípticos

Gilberto A. Paula (USP)

Resumo: Nesta palestra discutiremos os modelos mistos semiparamétricos elípticos (Ibacache Pulgar et al. (2012)) em que o componente não paramétrico é formado por splines cúbicos. Discutiremos em particular procedimentos de estimação e inferência bem como análises de resíduos e estudos para avaliar a sensibilidade das estimativas paramétricas e não paramétricas. Essa classe tem uma formulação similar à classe paramétrica com erros elípticos discutida em Savalli et al. (2006) e Osorio et al. (2007), em que o modelo marginal é também elíptico e as estruturas de variância-covariância além da média do modelo hierárquico são preservadas no modelo marginal. As estimativas do componente não paramétrico são também robustas contra observações aberrantes como ocorre com as estimativas dos parâmetros do componente paramétrico e da matriz de dispersão. Há ﬂexibilização da curtose para a distribuição das respostas de cada indivíduo (grupo) e os testes de componentes de variância podem ser desenvolvidos de forma similar ao caso paramétrico. Três conjuntos de dados preliminarmente analisados sob erros normais são reanalizados sob erros elípticos.

#### C11* - Knudsen Gas in Random Tube*

Marina Vachkovskaia (UNICAMP)

Resumo:* We shall present an elementary introduction to the theory of stochastic billiards, describing the joint work of the author with Francis Comets, Serguei Popov and Gunter Schütz. Stochastic billiard in a general domain is defined as follows: a particle moves according to its constant velocity inside some domain ${\mathcal D} \subset {\mathbb R}^d$ until it hits the boundary and bounces randomly inside according to some reflection law. The angle of the outgoing velocity with the inner normal vector has a specified, absolutely continuous density. We present various results concerning the properties of this process, and describe several applications to the gas dynamics and sampling of random vectors in high dimensions.*

#### C12* - A stochastic model for treatment induced drug resistance*

Rinaldo Schinazi (U. Colorado)

*We use a branching process to compute the probability of drug resistance during treatment.*