Back to overview

Robust regression with censored data and application to the analysis of hospital cost of stay

English title Robust regression with censored data and application to the analysis of hospital cost of stay
Applicant Marazzi Alfio
Number 116357
Funding scheme Project funding (Div. I-III)
Research institution Institut Universitaire de Médecine Sociale et Préventive - IUMSP CHUV et Université de Lausanne
Institution of higher education University of Lausanne - LA
Main discipline Mathematics
Start/End 01.07.2007 - 31.01.2010
Approved amount 99'171.00
Show all

All Disciplines (2)

Public Health and Health Services

Keywords (9)

Multiple regression; robust regression; censored observations; hospital cost; informative censoring; diagnosis related group; TML estimate; generalized log-gamma regression; medical costs

Lay Summary (English)

Lay summary
The aim of this project is to continue a series of researches on the development of robust statistical methods for the analysis of positive and asymmetrically distributed random variables. We are considering methods to estimate the “mean” of a positive and asymmetrically distributed random variable as a function of a number of covariates and in the presence of outliers, i.e., atypical extreme observations. Our main practical interest is the estimation and analysis of the “mean length” and the “mean cost” of hospital stays as a function of available administrative and medical data, which often contains extremely expensive or unusually cheap cases.
In a previous FNS founded research project we developed a robust regression estimate, called the truncated maximum likelihood (TML). We assumed that the error distribution belongs to a location-scale family of asymmetric distributions. The TML owns very desirable theoretical properties: it can attain a maximum (100%) relative efficiency with respect to the optimal maximum likelihood estimate while retaining a maximum breakdown point of 50%, i.e., a very high degree of resistance to outliers. This method provides reliable descriptions of the functional relationship between cost and available covariates (such as LOS, admission type, insurance type) when a relevant proportion of data is atypical (outliers). In a more recent study, we extended the TML to incorporate censoring. Discharge data (and thus, LOS and cost) are often censored because not all patients are followed until the endpoint of interest, which is usually time of home discharge.
In this project we propose two additional extensions of the robust methods. Firstly, in order to make the error model more flexible, we plan to develop robust high breakdown point and high efficiency methods for the generalized log-gamma regression model. This model has three parameters and includes as special cases, many of the models considered in the previous projects (e.g., log-Weibull and Normal). Secondly, we plan to improve the methods for the analysis of hospital cost of stay to account for informative censoring. The usual assumption of non-informative censoring (censoring independent of duration) is adequate on the duration (LOS) scale. However, it can be shown that it does not hold on the cost scale. Therefore, techniques for censored durations cannot directly be used to model censored cost of stay. We plan the development of robust techniques to estimate mean cost of stay by combining the estimated models for censored LOS with separate estimations of the mean cost per unit of time.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants


Associated projects

Number Title Start Funding scheme
108424 Robust regression with censored data and application to the analysis of hospital cost of stay 01.04.2005 Project funding (Div. I-III)
141266 Robust negative binomial regression with application to the analysis of hospital length of stay 01.07.2012 Project funding (Div. I-III)