project work - My files - File Catalog

project work

[ Download from this server (38.2 Kb) ]	2015-01-22, 5:05 PM
\documentclass[a4paper, 12pt]{article} \renewcommand{\baselinestretch}{2} \addtolength{\oddsidemargin}{-.575in} \addtolength{\evensidemargin}{-.875in} \addtolength{\textwidth}{1.35in} \addtolength{\topmargin}{-.275in} \addtolength{\textheight}{1.75in} \begin{document} \begin{center} \textbf{STATISTICAL ANALYSIS OF REPORTED CHOLERA CASES IN GHANA. A CASE STUDY IN KORLE BU POLYCLINIC} \end{center} \textbf{GROUP MEMBERS }\\ BIDINLIB MATTHEW 6207711\\ AFELETEY ISAAC PROMISE 6205211\\ NARTEY EUNICE KAKIE (MISS) 6209511\\ \textbf{PROJECT SUPERVISOR }\\ DR. G. OKYERE \begin{center} INTRODUCTION. \end{center} Cholera is an infection of the small intestine. It is caused by eating food or drinking water contaminated with a bacterium called Vibrio cholerae. It causes severe watery diarrhea and vomiting, which can lead to dehydration and even death if untreated. Every year, there are an estimated 3 to 5 million cholera cases and 100 000 to 120 000 deaths due to cholera. The short incubation period of two hours to five days, enhances the potentially explosive pattern of outbreaks (WHO, 2014). About 75\% of people infected with Vibrio. cholerae do not develop any symptoms, although the bacteria are present in their faeces for 7 to 14 days after infection and are shed back into the environment, potentially infecting other people. Among people who develop symptoms, 80\% have mild or moderate symptoms, while around 20\% develop acute watery diarrhea with severe dehydration. This can lead to death if untreated (WHO, 2014). Cholera was first identified in early 1800 in Asia. \\ The first pandemic occurred in the Bengal region of India starting in 1817 through 1824. The disease dispersed from India to Southeast Asia, China, Japan, the Middle East, and southern Russia. The disease is most common in places with poor sanitation, crowding, war, and famine. Common locations include parts of Africa, south Asia, and Latin America. \\ Ghana has seen outbreaks of the disease since the 1970s. Between 1970 and 2012, Ghana recorded a total of 5,498 cholera deaths, according to data compiled by the World Health Organization. According to the statistics, 1,546 deaths were recorded between 1970 and 1980 while 2,258 deaths were recorded between 1981 and 1990. Between 1991 and 1999, cholera claimed 1,067 lives, and between 2000 and 2012, 627 deaths were recorded (ghanaweb.com). \\ Considering death as an event, one would want to access the time to event, the probability that an individual will experience the event at a duration of time. This is a typical case of survival analysis, which looks at time to event data. The cox proportional hazard model in survival analysis is a good tool to consider. This will help us explore the relationship between the survival of a cholera patient and several explanatory variables like \begin{itemize} \item The age of patient \item Educational status of patient \item Gender of patient \item Geographical location of patients \item Socio economic status of patient. \end{itemize} In this chapter, an overview of the cox Proportional hazard model would be given; a brief description of the problem statement of the thesis is also presented together with the objectives, the methodology, the justification and the organization of the thesis. \newpage \textbf{ BACKGROUND OF STUDY}\\ Survival analysis is the modern name given to the collection of statistical procedures which accommodate time-to-event censored data. It is concerned with studying the time between entry to a study and a subsequent event (such as death). Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positive valued random variable, such as \begin{itemize} \item Time to death \item Time to onset (or relapse) of a disease \item Duration of a strike \item Money paid by health insurance \end{itemize} We may be interested in characterizing the distribution of “time to event” for a given population as well as comparing this “time to event” among different groups ( e . g ., treatment vs. control in a clinical trial or an observational study), or modeling the relationship of “time to event” to other covariates (sometimes called prognostic factors or predictors) (). Typically, in biomedical applications the data are collected over a finite period of time and consequently the “time to event” may not be observed for all the individuals in our study population (sample). This results in what is called censored data. That is, the “time to event” for those individuals who have not experienced the event under study is censored (by the end of study).\\ It is also common that the amount of follow-up for the individuals in a sample vary from subject to subject. Survival analysis examines and models the time it takes for events to occur. The prototypical such event is death, from which the name ''survival analysis'' and much of its terminology derives, but the ambit of application of survival analysis is much broader. Essentially the same methods are employed in a variety of disciplines under various rubrics – for example, ‘event-history analyses in sociology. In this appendix, therefore, terms such as survival are to be understood generically.\\ Survival analysis focuses on the distribution of survival times. Although there are well known methods for estimating unconditional survival distributions, most interesting survival modeling examines the relationship between survival and one or more predictors, usually termed covariates in the survival-analysis literature (John Fox, 2002). A Cox proportional hazard model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model provides an estimate of the treatment effect on survival after adjustment for other explanatory variables. It allows us to estimate the hazard (or risk) of death, or other event of interest, for individuals, given their prognostic variables.\\ The Cox model is based on a modeling approach to the analysis of survival data. The purpose of the model is to simultaneously explore the effects of several variables on survival. The Cox model is a well-recognized statistical technique for analyzing survival data. When it is used to analyses the survival of patients in a clinical trial, the model allows us to isolate the effects of treatment from the effects of other variables. The model can also be used, a priori, if it is known that there are other variables besides treatment that influence patient survival and these variables cannot be easily controlled in a clinical trial. Using the model may improve the estimate of treatment effect by narrowing the confidence interval. Survival times now often refer to the development of a particular symptom or to relapse after remission of a disease, as well as to the time to death Cox's method does not assume any particular distribution for the survival times, but it rather assumes that the effects of the different variables on survival are constant over time and are additive in a particular scale. Interpreting a Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard for patient having a high positive value on that particular variable is high. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable (http://www.xlstat.com/en/). \newpage \textbf{PROBLEM STATEMENT: }\\ The outbreak of cholera in Ghana saw many people being infected by the deadly disease. Because it is a deadly disease, we are not certain whether a patient will survive or die. Some people attribute the survival of a cholera patient to treatment, others attribute it to the gender whiles others also attribute it to chance. There is therefore a need to explore the factors that causes the death of a patient. Since the lives of people are involved here, one would not want to guess what will influence the death of a patient. It is therefore necessary to empirically know whether a patient will experience the event of death given some explanatory variable. The specific problem this thesis seeks to solve is to mathematically model the relationship between the survival of a patient and several explanatory variables. \\ \textbf{ OBJECTIVE OF STUDY} \begin{itemize} \item To Mathematically model the relationship between the survival of a patient and some explanatory variables \item To determine the multiplicative effect of each variable on the hazard \item To determine the Probability that an individual will experience the event of death within a small time interval, given that the individual has survived up to the beginning of the interval \end{itemize} \textbf{ METHODOLOGY}\\ The cox-ph model in survival analysis will help us model this problem mathematically using some explanatory variables which are the factors that can affect the death of a patient. It will also help us access multiplicative effect of each variable on the hazard and to access the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval. \textbf{ JUSTIFICATION OF THE STUDY}.\\ The study would help to reduce further death of cholera patients should one be infected by identifying the main factor or variable that has the lowest multiplicative effect on the hazard of survival of a patient. And ways to reduce or manage such a variable. This will also help the ministry of health and the government for that matter to properly allocate resource that are to be used to address the issue, so that unnecessary allocations will not be made. And thereby reducing the expenditure of government on cholera. By determining the probability of a patient experiencing the event of death after surviving to a time interval, it will help the health practitioners take more pragmatic steps to address the issue with much urgency at the time interval of survival in order to prevent more deaths. The study will also help the selected hospitals of study effectively handle cholera cases in the near future. \textbf{ LIMITATIION OF STUDY}\\ Due to limited funds and time constraints, this thesis focuses on three selected hospitals; Korle Bu Polyclinic, Mamprobi Polyclinic and Tema General Hospital all from Accra.\\ \textbf{ORGANIZATION OF THESIS.}\\ The thesis is organized as follows: In Chapter one, we present the background study of cox ph model. Also, Chapter two is devoted for review of related works in the field of cholera and cox ph model . Chapter three deals with the methodology used in the formulation of variants, models and the methods and solutions. Chapter four deals with data collection and analysis. Chapter five is the final chapter and it provides conclusion and recommendations of this study \newpage \begin{center} \underline{\textbf{CHAPTER 2}}\\ \end{center} \begin{center} \underline{ \textbf{LITERATURE REVIEW }} \end{center} \textit{(King et al., 1979)}. Compared three diets’ abilities to keep the rats tumor-free. They were interested in the relationship between diet and the development of tumors and therefore divided 90 rats into three groups and fed them low-fat, saturated fat, and unsaturated fat diets, respectively. The rats were of the same age and species and were in similar physical condition. An identical amount of tumor cells were injected into a foot pad of each rat. The rats were observed for 200 days. Many developed a recognizable tumor early in the study period. Some were tumor-free at the end of the 200 days. Rat 16 in the low-fat group and rat 24 in the saturated group died accidentally after 140 days and 170 days, respectively, with no evidence of tumor. Fifteen of the 30 rats on the low-fat diet developed a tumor before the experiment was terminated. The rat that died had a tumor-free time of at least 140 days. The other 14 rats did not develop any tumor by the end of the experiment; their tumor-free times were at least 200 days. Among the 30 rats in the saturated fat diet group, 23 developed a tumor, one died tumor-free after 170 days, and six were tumor-free at the end of the experiment. All 30 rats in the unsaturated fat diet group developed tumors within 200 days. The two early deaths can be considered losses to follow-up. The data are singly censored if the two early deaths are excluded.\\ Stephen J Walters School of Health and Related Research (ScHARR), University of Sheffield ( 2009) carried out a cox ph analysis on the data from a randomized trial comparing the effect of low-dose adjuvant interferon alfa-2a therapy with that of no further treatment in patients with malignant melanoma at high risk of recurrence. Malignant melanoma is a serious type of skin cancer, characterized by uncontrolled growth of pigment cells called melanocytes. In his trial, 674 patients with a radically resected malignant melanoma (who were at high risk of disease recurrence) were randomly assigned to one of two treatment groups: interferon (3 megaunits of interferon alfa-2a three times a week until recurrence of cancer, or for two years – whichever occurred first) or no further treatment. His primary aim of this multicentre study was to determine the effects of interferon on overall survival. Patients were followed for up to eight years from randomization. The final Cox model included two demographic (age and gender) and one baseline clinical variable (histology) as independent prognostic factors, plus a treatment variable. Model: $ cox=0.004Age –0.312Sex –0.033 Histology1 + 0.446Histology2 +0.569 Histology3 –0.090Group $ It was observed that older age and regionally metastatic cancer histology are associated with poorer survival, whereas being male is associated with better survival.\\ \textit{Masaaki Tsujitani et al. (2012)} discussed a flexible method for modeling survival data using penalized smoothing splines when the values of covariates change for the duration of the study. The Cox proportional hazards model has been widely used for the analysis of treatment and prognostic effects with censored survival data. However, a number of theoretical problems with respect to the baseline survival function remain unsolved. We use the generalized additive models (GAMs) with B splines to estimate the survival function and select the optimum smoothing parameters based on a variant multifold cross-validation (CV) method. The methods are compared with the generalized cross-validation (GCV) method using data from a long-term study of patients with primary biliary cirrhosis (PBC).\\ In total, 54,519 people from the placebo clusters were assembled. The incidence of cholera (1.30/1000/year) was significantly higher than that of V. parahaemolyticus diarrhea (0.63/1000/year). Cholera incidence was inversely related to age, whereas the risk of V. parahaemolyticus diarrhea was age-independent. The seasonality of diarrhea due to the two Vibrio species was similar. Cholera was distinguished by a higher frequency of severe dehydration, and V. parahaemolyticus diarrhea was by abdominal pain. Hindus and those who live in household not using boiled or treated water were more likely to have V. parahaemolyticus diarrhea. Young age, low socioeconomic status, and living closer to a project healthcare facility were associated with an increased risk for cholera. The high risk area for cholera differed from the high risk area for V. parahaemolyticus diarrhea. They report coexistence of the two vibrios in the slums of Kolkata. The two etiologies of diarrhea had a similar seasonality but had distinguishing clinical features. The risk factors and the high risk areas for the two diseases differ from one another suggesting different modes of transmission of these two pathogens.\textit{ Kanungo et al. (2012)}.\\ \textit{Ali M et al (2012)} evaluated the herd protection conferred by an oral cholera vaccine using 2 approaches:cluster design and geographic information system (GIS) design. Residents living in 3933 dwellings (clusters) in Kolkata, India, were cluster-randomized to receive either cholera vaccine or oral placebo. Nonpregnant residents $ aged ≥1 $ year were invited to participate in the trial. Only the first episode of cholera detected for a subject between 14 and 1095 days after a second dose was considered. In the cluster design, indirect protection was assessed by comparing the incidence of cholera among onparticipants in vaccine clusters vs those in placebo clusters. In the GIS analysis, herd protection was assessed by evaluating association between vaccine coverage among the population residing within 250 m of the household and the occurrence of cholera in that population. Result s. Among 107 347 eligible residents, 66 990 received 2 doses of either cholera vaccine or placebo. In the cluster design, the 3-year data showed significant total protection $ (66\% protection, 95\% confidence interval [CI], 50\%–78\%, P < .01) $ but no evidence of indirect protection. With the GIS approach, the risk of cholera among placebo recipients was inversely related to neighborhood-level vaccine coverage, and the trend was highly signifi- cant (P < .01). This relationship held in multivariable models that also controlled for potentially onfounding demographic variables $ (hazard ratio, 0.94 [95\% CI, .90–.98]; P < .01). T $hey concluded that, Indirect protection was evident in analyses using the GIS approach but not the cluster design approach, likely owing to considerable transmission of cholera between clusters, which would vitiate herd protection in the cluster analysis.\\ A sample of 432 inmates released from Maryland state prisons was followed for one year after. The event of interest was the first rearrest. The aim was to determine how the occurrence and timing of arrests depended on several covariates (predictor variables). Some of these covariates (like race, age at release, and number of previous convictions) remained constant over the one-year interval. Others (like marital status and employment status) could change at any time during the follow-up period release. It was observed that fully 75 percent of the cases were not arrested during the first year after release which shows in particular that, someone who is jailed after an arrest is not likely to be working full time in subsequent weeks (Rossi et al. 1980). \\ The analytical effect of the number of events per variable (EPV) in a proportional hazards regression analysis was evaluated using Monte Carlo simulation techniques for data from a randomized trial containing 673 patients and 252 deaths, in which seven predictor variables had an original significance level of$ p < 0.10. $The 252 deaths and 7 variables correspond to 36 events per variable analyzed in the full data set. Five hundred simulated analyses were conducted for these seven variables at EPVs of 2, 5, 10, 15, 20, and 25. For each simulation, a random exponential survival time was generated for each of the 673 patients, and the simulated results were compared with their original counterparts. As EPV decreased, the regression coefficients became more biased relative to the true value; the 90\% confidence limits about the simulated values did not have a coverage of 90\% for the original value; large sample properties did not hold for variance estimates from the proportional hazards model, and the Z statistics used to test the significance of the regression coefficients lost validity under the null hypothesis. Although a single boundary level for avoiding problems is not easy to choose, the value of EPV = 10 seems most prudent. Below this value for EPV, the results of proportional hazards regression analyses should be interpreted with caution because the statistical model may not be valid. \textit{Peduzzi et al. (1995)}\\ Efficacy and safety of a two-dose regimen of bivalent killed whole-cell oral cholera vaccine (Shantha Biotechnics, Hyderabad, India) to 3 years is established, but long-term efficacy is not. We aimed to assess protective efficacy up to 5 years in a slum area of Kolkata, India. In their double-blind, cluster-randomized, placebo-controlled trial, they assessed incidence of cholera in non-pregnant individuals older than 1 year residing in 3933 dwellings (clusters) in Kolkata, India. They randomly allocated participants, by dwelling, to receive two oral doses of modified killed bivalent whole-cell cholera vaccine or heat-killed Escherichia coli K12 placebo, 14 days apart. Randomization was done by use of a computer-generated sequence in blocks of four. The primary endpoint was prevention of episodes of culture-confirmed Vibrio cholerae O1 diarrhea severe enough for patients to seek treatment in a health-care facility. They identified culture-confirmed cholera cases among participants seeking treatment for diarrhea at a study clinic or government hospital between 14 days and 1825 days after receipt of the second dose. They assessed vaccine protection in a per-protocol population of participants who had completely ingested two doses of assigned study treatment. They observed that, 69 of 31 932 recipients of vaccine and 219 of 34 968 recipients of placebo developed cholera during 5 year follow-up (incidence 2•2 per 1000 in the vaccine group and 6•3 per 1000 in the placebo group). Cumulative protective efficacy of the vaccine at 5 years was $ 65\% (95\% CI 52–74; p<0•0001) $, and point estimates by year of follow-up suggested no evidence of decline in protective efficacy. Interpretation Sustained protection for 5 years at the level we reported has not been noted previously with other oral cholera vaccines. Established long-term efficacy of this vaccine could assist policy makers formulate rational vaccination strategies to reduce overall cholera burden in endemic settings. \textit{Dipika Sur et al. (2011)}\\ \textit{Tiago dos Santos Ferreira et al. (2012)} Infection increases the morbidity and mortality in liver cirrhosis patients. The aim of their study was to investigate the impact of infection related to survival and risk factors for death in adult patients with liver cirrhosis in a university hospital. Methods: In a retrospective cohort study of Brazilian hospitalized cirrhotic patients, medical records data were analysed, and all patients who have had one or more confirmed bacterial infection during admission were se-ected for the study. Also, some data as biochemical investigation, Child score, MELD estimation, and evolution and death event were included. Statistical analysis: chi-square, Fisher and Mann-Whitney tests were used. Uni and multivariate analysis were performed, according to Cox regression model. The significant statistical level was p 2.5 mg/dl had increased the risk of death of 4.1, 3.2 and 3.2, respectively. Conclusion: Bacterial infections in hospitalized irrhotic patients deserve special care, mainly spontaneous bacterial peritonitis, and also patients whose hiponatremia, upper gastrointestinal bleeding, high levels of cre-atinine and MELD high score are found.\\ \textit{Durham et al. (1998)} estimated the efficacy of killed whole-cell-only (WC) and B subunit killed whole-cell (BS-WC) oral cholera vaccines over 4 1/2 years of a vaccine trial in rural Matlab, Bangladesh . The placebo was a killed Escherichia coli strain. The trial was randomized and double-blinded among 89,596 subjects aged 2-1 5 years (male and female) and greater than 15 years (females only). They restrict our analyses to subjects that received three doses of vaccine or placebo (i.e., the full vaccination regimen) before May 1, 1985. There were 20,837, 20,743, and 20,705 such subjects in the placebo, WC, and BS-WC arms of the trial, respectively. The events of interest are reported, confirmed cases of cholera illness among the study subjects. Cases were classified into the two major biotypes that circulated during the trial, classic and El Tor cholera. They computed Kaplan-Meier estimates of the cholera case survival curves, S(t), for the placebo and two vaccine groups. They observed that, the fact that there is good separation between the vaccine and placebo curves indicates that the vaccines give protection. The BS-WC vaccine provides better protection during the first year. The curves slowly approach one another indicating the waning of protective effect, but this is difficult to see with plots based on cumulative incidence. To estimate smooth plots of the VE(t). They we use a method based on smoothing scaled residuals from a proportional hazards model In general, we code vaccine effects with a dichotomous variable$ z = 1 $ for vaccine and z = 0 for placebo, with fi{t) as the time-varying coefficient for the vaccine effect. Then their goal was to find the smoothed VE estimate$ VE(t) = 1 - RR(t) = 1 ~ e^{Bt}$ and its standard error. The smoothing was carried out with regression splines with four degrees of freedom. In those analyses which were not stratified by age group, we control for age effects by including a term for age group in the model. \newpage \begin{center} \underline{\textbf{CHAPTER 3}}\\ \end{center} \begin{center} \underline{\textbf{METHODOLOGY}}\\ \end{center} This chapter provides discussions of the methods for examining and modeling the time it takes for events to occur. In order to understand the Cox PH method of examining and modeling, it is necessary to have a good understanding of some of the methods of examining time to event data. Survival analysis is a branch of statistics which deals with analysis of time duration until one or more events happen, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?\\ since cox Ph is a form of regression (Semi- Parametric), we will first discus few things about Regression.\\ \underline{\textbf{LINEAR REGRESSION}}\\ Linear regression Describes a relation between some explanatory (predictor) variables and a variable of special interest, called the response variable. We use a predictor or ''independent'' variable $x$ to explain some of the uncertainty in a ''dependent'' variable $y$. It helps to Understand the relation between explanatory and response variables and to predict value of the response variable. for new explanatory variables.\\ \underline{\textbf{Simple Linear Regression}}\\ $$ Y_{i}= \beta _{0} + \beta _{1} x_{i} + \varepsilon _{i} \hspace{1cm} i=1,...,n $$\\ where:\\ $Y$ is the response\\ $x$ is the predictor\\ $\beta _{0} and \beta _{1}$ regression coefficients\\ $\varepsilon $ is an error term, such that;\\ $ E[\varepsilon_{i}]= \hspace{1cm} \mbox{for all} \: i\\ $ $var(\varepsilon_{i})= \sigma^{^{2}} \hspace{1cm} \mbox{for all} \: i\\$ $cov(\varepsilon_{i},\varepsilon_{j})= 0\hspace{1cm} \mbox{for} \: i\ne j $\\ Estimates of $ \beta _{0} and \beta _{1} $ are determined by least square approach\\ \underline{\textbf{Multiple Regression}}\\ $ Y_{i}=\beta _{0} +\beta _{1} x_{i1} +\beta_{2} x_{i2}+ ... + \beta_{p} x_{p}+ \varepsilon _{i}, \hspace{1cm} i=1,...,n $\\ where:\\ $Y$ is the response\\ $x_{1,..., x_{p}} are predictors$ $\beta_{0},..., \beta_{p}$ are regression coefficients. \\ \underline{\textbf{{\small SURVIVAL ANALYSIS AND COX PROPORTIONAL HAZARDS MODEL}}}\\ Survival analysis is the modern name given to the collection of statistical procedures which accommodate time-to-event censored data. It is concerned with studying the time between entry to a study and a subsequent event (such as death). Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positive valued random variable, such as \ • Time to death • Time to onset (or relapse) of a disease • Duration of a strike • Money paid by health insurance\ Survival models can be viewed as consisting of two parts: the underlying hazard function, often denoted\\ $ \lambda_{0}(t)$\\ It describes how the risk of event per time unit changes over time at baseline levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. Let T represent survival time. We regard T as a random variable with cumulative distribution function\\ \[ P(t)=Pr(T\leq) \]\\ and probability density function \\ \[ p(t)=\frac{dP(t)}{dt}\]\\ It will often be convenient to work with the complement of the c.d.f, the survival function. The more optimistic survival function S(t) is the complement of the distribution function, \\ \[S(t) = Pr(T > t) = 1 − P(t)\].\\ which gives the probability that the event of interest has not occurred by duration t \underline{\textbf{The Hazard Function}}\\ The hazard function is the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval \\ $ \lambda(t)=\lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} Pr[(t \leq T<t+ \bigtriangleup t) \mid T \geq t] $\\ $ = \lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} \frac{Pr([t \leq T<t+ \bigtriangleup t] \cap [T \geq ])}{Pr(T \geq t)} $ \\ The numerator of this expression is the conditional probability that the event will occur in the interval $ [t, t + dt)$ given that it has not occurred before, and the denominator is the width of the interval. Dividing one by the other we obtain a rate of event occurrence per unit of time. Taking the limit as the width of the interval goes down to zero, we obtain an instantaneous rate of occurrence.\\ $ = \lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} \frac{Pr(t \leq T<t+ \bigtriangleup t)} {Pr(T \geq t)} $\\ $ \lambda(t)=\frac{f(x)}{S(t)} $\\ In words, the rate of occurrence of the event at duration t equals the density of events at t, divided by the probability of surviving to that duration without experiencing the event. From the expression of $ S(t)$ $ \lambda(t)=- \frac{d}{dt} \log S(t) $\\ If we now integrate from 0 to t and introduce the boundary condition $S(0) =1$ (since the event is sure not to have occurred by duration 0), we can solve the above expression to obtain a formula for the probability of surviving to duration t as a function of the hazard at all durations up to $t$:\\ $ S(t)=exp(- \int_{0}^{t} \lambda(x) d(x)) $\\ Discrete random Variables: $ \lambda(a_{j})=\lambda_{j}=Pr(T-a_{j}\mid T\geq a_{j}) $\\ $ =\frac{P(T=a_{j})}{P(T\geq a_{j})} $\\ $ =\frac{f(t)}{\Sigma_{k:a_{k} \geq a_{j}}f(a_{j})} $\\ \textbf{cummulative hazard function } $\Lambda(t)$ \\ for continuous random variables:\\ $\Lambda(t)=\int_{0}^{t} \lambda(u)du $\\ \underline{\textbf{MODELING SURVIVAL DATA WITH SOME PARAMETRIC REGRESSION MODELS. }} \textbf{ THE EXPONENTIAL DISTRIBUTION}\\ $ f(t)=\lambda(e^{-\lambda t})$ for $t \geq 0$\\ $ S(t)= \int_{t}^{\infty}f(u)du $\\ $ S(t)=e^{-\lambda t} $\\ $\lambda(t)=\frac{f(t)}{S(t)} $ $\lambda(t) = \lambda $ \\ which is a constant hazard\\ $\Lambda(t)= \int_{0}^{t} \lambda(u) du $\\ $\Lambda(t)= \int_{0}^{t} \lambda du$\\ $\Lambda(t) = \lambda t$\\ \textbf{THE WEIBULL DISTRIBUTION} \\ \textit{(WITH TWO PARAMETERS)}\\ Let $ \lambda $ be the scale parameter\\ and $k$ be th shape parameter\\ $ S(t)= e^{- \lambda t^{k}} $\\ $f(t)=\frac{-d}{dt}S(t)= k \lambda t^{k-1}e^{- \lambda t^{k}}$\\ $ \lambda (t) = k \lambda t^{k-1} $\\ $\Lambda (t)\int_{0}^{t} \lambda (u)du = \lambda t^{k} $\\ The weibull distribution is convenient because of its simple form. it includes several hazard shapes:\\ $k=1$ for constant hazard\\ $0<k<1$ for decreasing hazard\\ $k>1$ for increasing hazard\\ \textbf{GAMMA DISTRIBUTION}\\ The gamma distribution with parameters $\lambda $ and $k$, denoted $\Gamma (\lambda, k)$, has density\\ $f(t) =\frac{\lambda(\lambda t)^{k-1}e^{-\lambda t}}{\Gamma(k)} $\\ and survivor function \\ $ S(t) = 1-I_{k}( \lambda t) $,\\ where $I_{k}(x)= \int_{0}^{x} \lambda^{k-1}e^{-x}dx/ \Gamma(k) $\\ there is no closed-form expression for the survival function, but there are excellent algorithms for its computation. thee is no explicit formula for the hazard either, but this may be computed easily as the ratio of the density to the survival function, $\lambda(t)= f(t)/S(t)$.\\ the gamma hazard increase monotonically if $k>1$, from a value of 0 at the origin to a maximum of $ \lambda,$\\ It is constant for $k=1$ and decrease monotonically if $ k<1 $, from $\infty$ at the origin to an asymptotic value of $\lambda$.\\ If $k=1$, the gamma reduces to an exponential distribution, which can be described as the waiting time to one hit in a Poisson process. \\ \underline{\textbf{NON-PARAMETRIC ESTIMATION}}\\ \textbf{KAPLAN- MEIER ESTIMATE.}\\ Suppose $a_{t}<t \leq a_{k+1}$. Then\\ $ S(t)= P(T \geq a_{k+1}) $\\ $ S(t)= P(T \geq a_{1}, T\geq a_{2}, ...,T\geq a_{k+1} $\\ $ S(t)= P(T \geq a_{1})x \Pi_{j=1}^{k} P(T \geq a_{j+1}\mid T\geq a_{j}) $\\ $ S(t)= \Pi_{j=1}^{k} [1- P(T= a_{j}\mid T\geq a_{j})] $\\ $ S(t)=\Pi_{j=1}^{k} [1-\lambda_{j}] $\\ so $ \hat{S}(t) \cong \Pi_{j=1}^{k} (1- \frac{d_{j}}{r_{j}}) $\\ $ \hat{S}(t) =\Pi_{j:a_{j}<t} (1- \frac{d_{j}}{r_{j}}) $\\ $ d_{j}$ is the number of deaths at $a_{j}$\\ $r_{j}$ is the number at risk at $a_{j}$\\ Where $\ast \tau_{1},\tau _{2},...,\tau_{k}$ is the set of K distinct event times observed in the sample \\ $\ast d_{j}$ is the number of events at $\tau_{j}$\\ $\ast r_{j}$ is the number of individuals ''at risk'' right before the $j-th$ event time (individuals experiencing the event or censored at or after that time).\\ $\ast c_{j}$ is the number of censored observations between the $j-th$ and (j+1) event times. censored tied at event times are included in $c_{j}$.\\ \textbf{Greenwood's formula}\\ If $ \hat{\lambda}_{j}=\frac{d_{j}}{r_{j}} $\\ then $ \hat{S}(t) =\Pi_{j:a_{j}<t} (1- \hat{\lambda}) $\\ Now, instead of dealing with $ \hat{S}(t)$ directly, we will look at the log of it\\ $ log[\hat{S} (t)] = \sum_{j:\tau j<t } log(1-\hat{\lambda}) $\\ Thus, by approximate independence of the $\hat{\lambda}_{j}'s,$\\ $ var(log[\hat{S} (t)])= \sum_{j:\tau j<t } var[log(1-\hat{\lambda}_{j}) ] $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } (\frac{1}{1-\hat{\lambda}_{j}}) var(\hat{\lambda_{j}}) $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } (\frac{1}{1-\hat{\lambda}_{j}})^{2} var(\hat{\lambda}_{j}) $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } \frac{\hat{\lambda}}{(1-\hat{\lambda}_{j})r_{j}} $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } \frac{d_{j}}{(r_{j}- d_{j})r_{j}} $\\ Now, $\hat{S}(t)= exp[log[\hat{S}(t)]]$.\\ $ var(\hat{S}(t)) =[\hat{S}(t)]^{2} var [log[\hat{S}(t)]] $\\ \textbf{hence the Greenwood's formula:}\\ $ var(\hat{S}(t)) =[\hat{S}(t)]^{2} \sum_{j:\tau j<t } \frac{d_{j}}{(r_{j}- d_{j})r_{j}} $\\ \underline{\textbf{COX PROPORTIONAL HAZARD MODEL}}\\ A Cox proportional hazard model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model provides an estimate of the treatment effect on survival after adjustment for other explanatory variables. It allows us to estimate the hazard (or risk) of death, or other event of interest, for individuals, given their prognostic variables. \\ Interpreting a Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard for patient having a high positive value on that particular variable is high. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable. Cox's method does not assume any particular distribution for the survival times, but it rather assumes that the effects of the different variables on survival are constant over time and are additive in a particular scale. The Cox model is a semi parametric model. It • makes no assumptions about the form of $ h (t )$ (nonparametric part of model) • assumes parametric form for the effect of the predictors on the hazard \\ The hazard function is the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval. It can therefore be interpreted as the risk of dying at time t. \\ \textbf{THE LIKELIHOOD}\\ Let $Yi$ denote the observed time (either censoring time or event time) for subject $i$, and let $Ci $ be the indicator that the time corresponds to an event (i.e. if $Ci = 1 $ the event occurred and if $ Ci = 0 $ the time is a censoring time). \\ The hazard function for the Cox proportional hazard model has the form: $$ \lambda(t\|X)=\lambda_{0} (t)exp(\beta_{1} X_{1} + ...+ \beta_{p}X_{p}) $$\\ $$ =\lambda_{0} (t)exp(X\beta^{^{'}}) $$ where $X=(x_{1},..., x_{p}) $are the explainatory/predictor variables.\\ $\lambda_{0}(t)$ is called the baseline hazards.\\ This expression gives the hazard at time t for an individual with covariate vector (explanatory variables) $X $. Based on this hazard function $$ L_{i}=\{ \frac{exp(\beta x_{i})}{exp(\beta x_{i}) +exp(\beta x_{i+1}) + ...+ exp(\beta x_{n})}\}^{\delta_{i}} $$\\ \underline{\textbf{The partial likelihood}}\\ The hazards included in the denominator are only those individuals who are at risk at the ith event (or censoring) time. The entire likelihood function can be expressed very concisely as $$ PL= \Pi_{i=1}^{n}\:\:( \frac{exp{\beta x_{i}}}{ \sum_{j=1}^n\:\:exp{\beta x_{j}} })^{\delta_{i}} $$\\ where $Y_{ij}=1$ if $t_{j}\geq t_{i},$ and $Yij=0$ if $ t_{j}<t_{i}$\\ \textbf{ The log partial likelihood }\\ This is given by\\ $$ l(\beta)= \log L(\beta)=\: \sum_{Y_{j}\geq Y_{i}} {X_{i}\beta -\log {\sum_{Y_{j}\geq Y_{i}} \:exp(X_{j}\beta) }} $$ \end{document}
1 2 3 4 5 Category: My files \| Added by: SCRIPTURE
Views: 270 \| Downloads: 17 \| Rating: 0.0/0

Total comments: 0

[ Download from this server (38.2 Kb) ]	2015-01-22, 5:05 PM
\documentclass[a4paper, 12pt]{article} \renewcommand{\baselinestretch}{2} \addtolength{\oddsidemargin}{-.575in} \addtolength{\evensidemargin}{-.875in} \addtolength{\textwidth}{1.35in} \addtolength{\topmargin}{-.275in} \addtolength{\textheight}{1.75in} \begin{document} \begin{center} \textbf{STATISTICAL ANALYSIS OF REPORTED CHOLERA CASES IN GHANA. A CASE STUDY IN KORLE BU POLYCLINIC} \end{center} \textbf{GROUP MEMBERS }\\ BIDINLIB MATTHEW 6207711\\ AFELETEY ISAAC PROMISE 6205211\\ NARTEY EUNICE KAKIE (MISS) 6209511\\ \textbf{PROJECT SUPERVISOR }\\ DR. G. OKYERE \begin{center} INTRODUCTION. \end{center} Cholera is an infection of the small intestine. It is caused by eating food or drinking water contaminated with a bacterium called Vibrio cholerae. It causes severe watery diarrhea and vomiting, which can lead to dehydration and even death if untreated. Every year, there are an estimated 3 to 5 million cholera cases and 100 000 to 120 000 deaths due to cholera. The short incubation period of two hours to five days, enhances the potentially explosive pattern of outbreaks (WHO, 2014). About 75\% of people infected with Vibrio. cholerae do not develop any symptoms, although the bacteria are present in their faeces for 7 to 14 days after infection and are shed back into the environment, potentially infecting other people. Among people who develop symptoms, 80\% have mild or moderate symptoms, while around 20\% develop acute watery diarrhea with severe dehydration. This can lead to death if untreated (WHO, 2014). Cholera was first identified in early 1800 in Asia. \\ The first pandemic occurred in the Bengal region of India starting in 1817 through 1824. The disease dispersed from India to Southeast Asia, China, Japan, the Middle East, and southern Russia. The disease is most common in places with poor sanitation, crowding, war, and famine. Common locations include parts of Africa, south Asia, and Latin America. \\ Ghana has seen outbreaks of the disease since the 1970s. Between 1970 and 2012, Ghana recorded a total of 5,498 cholera deaths, according to data compiled by the World Health Organization. According to the statistics, 1,546 deaths were recorded between 1970 and 1980 while 2,258 deaths were recorded between 1981 and 1990. Between 1991 and 1999, cholera claimed 1,067 lives, and between 2000 and 2012, 627 deaths were recorded (ghanaweb.com). \\ Considering death as an event, one would want to access the time to event, the probability that an individual will experience the event at a duration of time. This is a typical case of survival analysis, which looks at time to event data. The cox proportional hazard model in survival analysis is a good tool to consider. This will help us explore the relationship between the survival of a cholera patient and several explanatory variables like \begin{itemize} \item The age of patient \item Educational status of patient \item Gender of patient \item Geographical location of patients \item Socio economic status of patient. \end{itemize} In this chapter, an overview of the cox Proportional hazard model would be given; a brief description of the problem statement of the thesis is also presented together with the objectives, the methodology, the justification and the organization of the thesis. \newpage \textbf{ BACKGROUND OF STUDY}\\ Survival analysis is the modern name given to the collection of statistical procedures which accommodate time-to-event censored data. It is concerned with studying the time between entry to a study and a subsequent event (such as death). Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positive valued random variable, such as \begin{itemize} \item Time to death \item Time to onset (or relapse) of a disease \item Duration of a strike \item Money paid by health insurance \end{itemize} We may be interested in characterizing the distribution of “time to event” for a given population as well as comparing this “time to event” among different groups ( e . g ., treatment vs. control in a clinical trial or an observational study), or modeling the relationship of “time to event” to other covariates (sometimes called prognostic factors or predictors) (). Typically, in biomedical applications the data are collected over a finite period of time and consequently the “time to event” may not be observed for all the individuals in our study population (sample). This results in what is called censored data. That is, the “time to event” for those individuals who have not experienced the event under study is censored (by the end of study).\\ It is also common that the amount of follow-up for the individuals in a sample vary from subject to subject. Survival analysis examines and models the time it takes for events to occur. The prototypical such event is death, from which the name ''survival analysis'' and much of its terminology derives, but the ambit of application of survival analysis is much broader. Essentially the same methods are employed in a variety of disciplines under various rubrics – for example, ‘event-history analyses in sociology. In this appendix, therefore, terms such as survival are to be understood generically.\\ Survival analysis focuses on the distribution of survival times. Although there are well known methods for estimating unconditional survival distributions, most interesting survival modeling examines the relationship between survival and one or more predictors, usually termed covariates in the survival-analysis literature (John Fox, 2002). A Cox proportional hazard model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model provides an estimate of the treatment effect on survival after adjustment for other explanatory variables. It allows us to estimate the hazard (or risk) of death, or other event of interest, for individuals, given their prognostic variables.\\ The Cox model is based on a modeling approach to the analysis of survival data. The purpose of the model is to simultaneously explore the effects of several variables on survival. The Cox model is a well-recognized statistical technique for analyzing survival data. When it is used to analyses the survival of patients in a clinical trial, the model allows us to isolate the effects of treatment from the effects of other variables. The model can also be used, a priori, if it is known that there are other variables besides treatment that influence patient survival and these variables cannot be easily controlled in a clinical trial. Using the model may improve the estimate of treatment effect by narrowing the confidence interval. Survival times now often refer to the development of a particular symptom or to relapse after remission of a disease, as well as to the time to death Cox's method does not assume any particular distribution for the survival times, but it rather assumes that the effects of the different variables on survival are constant over time and are additive in a particular scale. Interpreting a Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard for patient having a high positive value on that particular variable is high. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable (http://www.xlstat.com/en/). \newpage \textbf{PROBLEM STATEMENT: }\\ The outbreak of cholera in Ghana saw many people being infected by the deadly disease. Because it is a deadly disease, we are not certain whether a patient will survive or die. Some people attribute the survival of a cholera patient to treatment, others attribute it to the gender whiles others also attribute it to chance. There is therefore a need to explore the factors that causes the death of a patient. Since the lives of people are involved here, one would not want to guess what will influence the death of a patient. It is therefore necessary to empirically know whether a patient will experience the event of death given some explanatory variable. The specific problem this thesis seeks to solve is to mathematically model the relationship between the survival of a patient and several explanatory variables. \\ \textbf{ OBJECTIVE OF STUDY} \begin{itemize} \item To Mathematically model the relationship between the survival of a patient and some explanatory variables \item To determine the multiplicative effect of each variable on the hazard \item To determine the Probability that an individual will experience the event of death within a small time interval, given that the individual has survived up to the beginning of the interval \end{itemize} \textbf{ METHODOLOGY}\\ The cox-ph model in survival analysis will help us model this problem mathematically using some explanatory variables which are the factors that can affect the death of a patient. It will also help us access multiplicative effect of each variable on the hazard and to access the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval. \textbf{ JUSTIFICATION OF THE STUDY}.\\ The study would help to reduce further death of cholera patients should one be infected by identifying the main factor or variable that has the lowest multiplicative effect on the hazard of survival of a patient. And ways to reduce or manage such a variable. This will also help the ministry of health and the government for that matter to properly allocate resource that are to be used to address the issue, so that unnecessary allocations will not be made. And thereby reducing the expenditure of government on cholera. By determining the probability of a patient experiencing the event of death after surviving to a time interval, it will help the health practitioners take more pragmatic steps to address the issue with much urgency at the time interval of survival in order to prevent more deaths. The study will also help the selected hospitals of study effectively handle cholera cases in the near future. \textbf{ LIMITATIION OF STUDY}\\ Due to limited funds and time constraints, this thesis focuses on three selected hospitals; Korle Bu Polyclinic, Mamprobi Polyclinic and Tema General Hospital all from Accra.\\ \textbf{ORGANIZATION OF THESIS.}\\ The thesis is organized as follows: In Chapter one, we present the background study of cox ph model. Also, Chapter two is devoted for review of related works in the field of cholera and cox ph model . Chapter three deals with the methodology used in the formulation of variants, models and the methods and solutions. Chapter four deals with data collection and analysis. Chapter five is the final chapter and it provides conclusion and recommendations of this study \newpage \begin{center} \underline{\textbf{CHAPTER 2}}\\ \end{center} \begin{center} \underline{ \textbf{LITERATURE REVIEW }} \end{center} \textit{(King et al., 1979)}. Compared three diets’ abilities to keep the rats tumor-free. They were interested in the relationship between diet and the development of tumors and therefore divided 90 rats into three groups and fed them low-fat, saturated fat, and unsaturated fat diets, respectively. The rats were of the same age and species and were in similar physical condition. An identical amount of tumor cells were injected into a foot pad of each rat. The rats were observed for 200 days. Many developed a recognizable tumor early in the study period. Some were tumor-free at the end of the 200 days. Rat 16 in the low-fat group and rat 24 in the saturated group died accidentally after 140 days and 170 days, respectively, with no evidence of tumor. Fifteen of the 30 rats on the low-fat diet developed a tumor before the experiment was terminated. The rat that died had a tumor-free time of at least 140 days. The other 14 rats did not develop any tumor by the end of the experiment; their tumor-free times were at least 200 days. Among the 30 rats in the saturated fat diet group, 23 developed a tumor, one died tumor-free after 170 days, and six were tumor-free at the end of the experiment. All 30 rats in the unsaturated fat diet group developed tumors within 200 days. The two early deaths can be considered losses to follow-up. The data are singly censored if the two early deaths are excluded.\\ Stephen J Walters School of Health and Related Research (ScHARR), University of Sheffield ( 2009) carried out a cox ph analysis on the data from a randomized trial comparing the effect of low-dose adjuvant interferon alfa-2a therapy with that of no further treatment in patients with malignant melanoma at high risk of recurrence. Malignant melanoma is a serious type of skin cancer, characterized by uncontrolled growth of pigment cells called melanocytes. In his trial, 674 patients with a radically resected malignant melanoma (who were at high risk of disease recurrence) were randomly assigned to one of two treatment groups: interferon (3 megaunits of interferon alfa-2a three times a week until recurrence of cancer, or for two years – whichever occurred first) or no further treatment. His primary aim of this multicentre study was to determine the effects of interferon on overall survival. Patients were followed for up to eight years from randomization. The final Cox model included two demographic (age and gender) and one baseline clinical variable (histology) as independent prognostic factors, plus a treatment variable. Model: $ cox=0.004Age –0.312Sex –0.033 Histology1 + 0.446Histology2 +0.569 Histology3 –0.090Group $ It was observed that older age and regionally metastatic cancer histology are associated with poorer survival, whereas being male is associated with better survival.\\ \textit{Masaaki Tsujitani et al. (2012)} discussed a flexible method for modeling survival data using penalized smoothing splines when the values of covariates change for the duration of the study. The Cox proportional hazards model has been widely used for the analysis of treatment and prognostic effects with censored survival data. However, a number of theoretical problems with respect to the baseline survival function remain unsolved. We use the generalized additive models (GAMs) with B splines to estimate the survival function and select the optimum smoothing parameters based on a variant multifold cross-validation (CV) method. The methods are compared with the generalized cross-validation (GCV) method using data from a long-term study of patients with primary biliary cirrhosis (PBC).\\ In total, 54,519 people from the placebo clusters were assembled. The incidence of cholera (1.30/1000/year) was significantly higher than that of V. parahaemolyticus diarrhea (0.63/1000/year). Cholera incidence was inversely related to age, whereas the risk of V. parahaemolyticus diarrhea was age-independent. The seasonality of diarrhea due to the two Vibrio species was similar. Cholera was distinguished by a higher frequency of severe dehydration, and V. parahaemolyticus diarrhea was by abdominal pain. Hindus and those who live in household not using boiled or treated water were more likely to have V. parahaemolyticus diarrhea. Young age, low socioeconomic status, and living closer to a project healthcare facility were associated with an increased risk for cholera. The high risk area for cholera differed from the high risk area for V. parahaemolyticus diarrhea. They report coexistence of the two vibrios in the slums of Kolkata. The two etiologies of diarrhea had a similar seasonality but had distinguishing clinical features. The risk factors and the high risk areas for the two diseases differ from one another suggesting different modes of transmission of these two pathogens.\textit{ Kanungo et al. (2012)}.\\ \textit{Ali M et al (2012)} evaluated the herd protection conferred by an oral cholera vaccine using 2 approaches:cluster design and geographic information system (GIS) design. Residents living in 3933 dwellings (clusters) in Kolkata, India, were cluster-randomized to receive either cholera vaccine or oral placebo. Nonpregnant residents $ aged ≥1 $ year were invited to participate in the trial. Only the first episode of cholera detected for a subject between 14 and 1095 days after a second dose was considered. In the cluster design, indirect protection was assessed by comparing the incidence of cholera among onparticipants in vaccine clusters vs those in placebo clusters. In the GIS analysis, herd protection was assessed by evaluating association between vaccine coverage among the population residing within 250 m of the household and the occurrence of cholera in that population. Result s. Among 107 347 eligible residents, 66 990 received 2 doses of either cholera vaccine or placebo. In the cluster design, the 3-year data showed significant total protection $ (66\% protection, 95\% confidence interval [CI], 50\%–78\%, P < .01) $ but no evidence of indirect protection. With the GIS approach, the risk of cholera among placebo recipients was inversely related to neighborhood-level vaccine coverage, and the trend was highly signifi- cant (P < .01). This relationship held in multivariable models that also controlled for potentially onfounding demographic variables $ (hazard ratio, 0.94 [95\% CI, .90–.98]; P < .01). T $hey concluded that, Indirect protection was evident in analyses using the GIS approach but not the cluster design approach, likely owing to considerable transmission of cholera between clusters, which would vitiate herd protection in the cluster analysis.\\ A sample of 432 inmates released from Maryland state prisons was followed for one year after. The event of interest was the first rearrest. The aim was to determine how the occurrence and timing of arrests depended on several covariates (predictor variables). Some of these covariates (like race, age at release, and number of previous convictions) remained constant over the one-year interval. Others (like marital status and employment status) could change at any time during the follow-up period release. It was observed that fully 75 percent of the cases were not arrested during the first year after release which shows in particular that, someone who is jailed after an arrest is not likely to be working full time in subsequent weeks (Rossi et al. 1980). \\ The analytical effect of the number of events per variable (EPV) in a proportional hazards regression analysis was evaluated using Monte Carlo simulation techniques for data from a randomized trial containing 673 patients and 252 deaths, in which seven predictor variables had an original significance level of$ p < 0.10. $The 252 deaths and 7 variables correspond to 36 events per variable analyzed in the full data set. Five hundred simulated analyses were conducted for these seven variables at EPVs of 2, 5, 10, 15, 20, and 25. For each simulation, a random exponential survival time was generated for each of the 673 patients, and the simulated results were compared with their original counterparts. As EPV decreased, the regression coefficients became more biased relative to the true value; the 90\% confidence limits about the simulated values did not have a coverage of 90\% for the original value; large sample properties did not hold for variance estimates from the proportional hazards model, and the Z statistics used to test the significance of the regression coefficients lost validity under the null hypothesis. Although a single boundary level for avoiding problems is not easy to choose, the value of EPV = 10 seems most prudent. Below this value for EPV, the results of proportional hazards regression analyses should be interpreted with caution because the statistical model may not be valid. \textit{Peduzzi et al. (1995)}\\ Efficacy and safety of a two-dose regimen of bivalent killed whole-cell oral cholera vaccine (Shantha Biotechnics, Hyderabad, India) to 3 years is established, but long-term efficacy is not. We aimed to assess protective efficacy up to 5 years in a slum area of Kolkata, India. In their double-blind, cluster-randomized, placebo-controlled trial, they assessed incidence of cholera in non-pregnant individuals older than 1 year residing in 3933 dwellings (clusters) in Kolkata, India. They randomly allocated participants, by dwelling, to receive two oral doses of modified killed bivalent whole-cell cholera vaccine or heat-killed Escherichia coli K12 placebo, 14 days apart. Randomization was done by use of a computer-generated sequence in blocks of four. The primary endpoint was prevention of episodes of culture-confirmed Vibrio cholerae O1 diarrhea severe enough for patients to seek treatment in a health-care facility. They identified culture-confirmed cholera cases among participants seeking treatment for diarrhea at a study clinic or government hospital between 14 days and 1825 days after receipt of the second dose. They assessed vaccine protection in a per-protocol population of participants who had completely ingested two doses of assigned study treatment. They observed that, 69 of 31 932 recipients of vaccine and 219 of 34 968 recipients of placebo developed cholera during 5 year follow-up (incidence 2•2 per 1000 in the vaccine group and 6•3 per 1000 in the placebo group). Cumulative protective efficacy of the vaccine at 5 years was $ 65\% (95\% CI 52–74; p<0•0001) $, and point estimates by year of follow-up suggested no evidence of decline in protective efficacy. Interpretation Sustained protection for 5 years at the level we reported has not been noted previously with other oral cholera vaccines. Established long-term efficacy of this vaccine could assist policy makers formulate rational vaccination strategies to reduce overall cholera burden in endemic settings. \textit{Dipika Sur et al. (2011)}\\ \textit{Tiago dos Santos Ferreira et al. (2012)} Infection increases the morbidity and mortality in liver cirrhosis patients. The aim of their study was to investigate the impact of infection related to survival and risk factors for death in adult patients with liver cirrhosis in a university hospital. Methods: In a retrospective cohort study of Brazilian hospitalized cirrhotic patients, medical records data were analysed, and all patients who have had one or more confirmed bacterial infection during admission were se-ected for the study. Also, some data as biochemical investigation, Child score, MELD estimation, and evolution and death event were included. Statistical analysis: chi-square, Fisher and Mann-Whitney tests were used. Uni and multivariate analysis were performed, according to Cox regression model. The significant statistical level was p 2.5 mg/dl had increased the risk of death of 4.1, 3.2 and 3.2, respectively. Conclusion: Bacterial infections in hospitalized irrhotic patients deserve special care, mainly spontaneous bacterial peritonitis, and also patients whose hiponatremia, upper gastrointestinal bleeding, high levels of cre-atinine and MELD high score are found.\\ \textit{Durham et al. (1998)} estimated the efficacy of killed whole-cell-only (WC) and B subunit killed whole-cell (BS-WC) oral cholera vaccines over 4 1/2 years of a vaccine trial in rural Matlab, Bangladesh . The placebo was a killed Escherichia coli strain. The trial was randomized and double-blinded among 89,596 subjects aged 2-1 5 years (male and female) and greater than 15 years (females only). They restrict our analyses to subjects that received three doses of vaccine or placebo (i.e., the full vaccination regimen) before May 1, 1985. There were 20,837, 20,743, and 20,705 such subjects in the placebo, WC, and BS-WC arms of the trial, respectively. The events of interest are reported, confirmed cases of cholera illness among the study subjects. Cases were classified into the two major biotypes that circulated during the trial, classic and El Tor cholera. They computed Kaplan-Meier estimates of the cholera case survival curves, S(t), for the placebo and two vaccine groups. They observed that, the fact that there is good separation between the vaccine and placebo curves indicates that the vaccines give protection. The BS-WC vaccine provides better protection during the first year. The curves slowly approach one another indicating the waning of protective effect, but this is difficult to see with plots based on cumulative incidence. To estimate smooth plots of the VE(t). They we use a method based on smoothing scaled residuals from a proportional hazards model In general, we code vaccine effects with a dichotomous variable$ z = 1 $ for vaccine and z = 0 for placebo, with fi{t) as the time-varying coefficient for the vaccine effect. Then their goal was to find the smoothed VE estimate$ VE(t) = 1 - RR(t) = 1 ~ e^{Bt}$ and its standard error. The smoothing was carried out with regression splines with four degrees of freedom. In those analyses which were not stratified by age group, we control for age effects by including a term for age group in the model. \newpage \begin{center} \underline{\textbf{CHAPTER 3}}\\ \end{center} \begin{center} \underline{\textbf{METHODOLOGY}}\\ \end{center} This chapter provides discussions of the methods for examining and modeling the time it takes for events to occur. In order to understand the Cox PH method of examining and modeling, it is necessary to have a good understanding of some of the methods of examining time to event data. Survival analysis is a branch of statistics which deals with analysis of time duration until one or more events happen, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?\\ since cox Ph is a form of regression (Semi- Parametric), we will first discus few things about Regression.\\ \underline{\textbf{LINEAR REGRESSION}}\\ Linear regression Describes a relation between some explanatory (predictor) variables and a variable of special interest, called the response variable. We use a predictor or ''independent'' variable $x$ to explain some of the uncertainty in a ''dependent'' variable $y$. It helps to Understand the relation between explanatory and response variables and to predict value of the response variable. for new explanatory variables.\\ \underline{\textbf{Simple Linear Regression}}\\ $$ Y_{i}= \beta _{0} + \beta _{1} x_{i} + \varepsilon _{i} \hspace{1cm} i=1,...,n $$\\ where:\\ $Y$ is the response\\ $x$ is the predictor\\ $\beta _{0} and \beta _{1}$ regression coefficients\\ $\varepsilon $ is an error term, such that;\\ $ E[\varepsilon_{i}]= \hspace{1cm} \mbox{for all} \: i\\ $ $var(\varepsilon_{i})= \sigma^{^{2}} \hspace{1cm} \mbox{for all} \: i\\$ $cov(\varepsilon_{i},\varepsilon_{j})= 0\hspace{1cm} \mbox{for} \: i\ne j $\\ Estimates of $ \beta _{0} and \beta _{1} $ are determined by least square approach\\ \underline{\textbf{Multiple Regression}}\\ $ Y_{i}=\beta _{0} +\beta _{1} x_{i1} +\beta_{2} x_{i2}+ ... + \beta_{p} x_{p}+ \varepsilon _{i}, \hspace{1cm} i=1,...,n $\\ where:\\ $Y$ is the response\\ $x_{1,..., x_{p}} are predictors$ $\beta_{0},..., \beta_{p}$ are regression coefficients. \\ \underline{\textbf{{\small SURVIVAL ANALYSIS AND COX PROPORTIONAL HAZARDS MODEL}}}\\ Survival analysis is the modern name given to the collection of statistical procedures which accommodate time-to-event censored data. It is concerned with studying the time between entry to a study and a subsequent event (such as death). Survival Analysis typically focuses on time to event data. In the most general sense, it consists of techniques for positive valued random variable, such as \ • Time to death • Time to onset (or relapse) of a disease • Duration of a strike • Money paid by health insurance\ Survival models can be viewed as consisting of two parts: the underlying hazard function, often denoted\\ $ \lambda_{0}(t)$\\ It describes how the risk of event per time unit changes over time at baseline levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. Let T represent survival time. We regard T as a random variable with cumulative distribution function\\ \[ P(t)=Pr(T\leq) \]\\ and probability density function \\ \[ p(t)=\frac{dP(t)}{dt}\]\\ It will often be convenient to work with the complement of the c.d.f, the survival function. The more optimistic survival function S(t) is the complement of the distribution function, \\ \[S(t) = Pr(T > t) = 1 − P(t)\].\\ which gives the probability that the event of interest has not occurred by duration t \underline{\textbf{The Hazard Function}}\\ The hazard function is the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval \\ $ \lambda(t)=\lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} Pr[(t \leq T<t+ \bigtriangleup t) \mid T \geq t] $\\ $ = \lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} \frac{Pr([t \leq T<t+ \bigtriangleup t] \cap [T \geq ])}{Pr(T \geq t)} $ \\ The numerator of this expression is the conditional probability that the event will occur in the interval $ [t, t + dt)$ given that it has not occurred before, and the denominator is the width of the interval. Dividing one by the other we obtain a rate of event occurrence per unit of time. Taking the limit as the width of the interval goes down to zero, we obtain an instantaneous rate of occurrence.\\ $ = \lim_ {\bigtriangleup t \rightarrow 0} \frac{1}{\Delta t} \frac{Pr(t \leq T<t+ \bigtriangleup t)} {Pr(T \geq t)} $\\ $ \lambda(t)=\frac{f(x)}{S(t)} $\\ In words, the rate of occurrence of the event at duration t equals the density of events at t, divided by the probability of surviving to that duration without experiencing the event. From the expression of $ S(t)$ $ \lambda(t)=- \frac{d}{dt} \log S(t) $\\ If we now integrate from 0 to t and introduce the boundary condition $S(0) =1$ (since the event is sure not to have occurred by duration 0), we can solve the above expression to obtain a formula for the probability of surviving to duration t as a function of the hazard at all durations up to $t$:\\ $ S(t)=exp(- \int_{0}^{t} \lambda(x) d(x)) $\\ Discrete random Variables: $ \lambda(a_{j})=\lambda_{j}=Pr(T-a_{j}\mid T\geq a_{j}) $\\ $ =\frac{P(T=a_{j})}{P(T\geq a_{j})} $\\ $ =\frac{f(t)}{\Sigma_{k:a_{k} \geq a_{j}}f(a_{j})} $\\ \textbf{cummulative hazard function } $\Lambda(t)$ \\ for continuous random variables:\\ $\Lambda(t)=\int_{0}^{t} \lambda(u)du $\\ \underline{\textbf{MODELING SURVIVAL DATA WITH SOME PARAMETRIC REGRESSION MODELS. }} \textbf{ THE EXPONENTIAL DISTRIBUTION}\\ $ f(t)=\lambda(e^{-\lambda t})$ for $t \geq 0$\\ $ S(t)= \int_{t}^{\infty}f(u)du $\\ $ S(t)=e^{-\lambda t} $\\ $\lambda(t)=\frac{f(t)}{S(t)} $ $\lambda(t) = \lambda $ \\ which is a constant hazard\\ $\Lambda(t)= \int_{0}^{t} \lambda(u) du $\\ $\Lambda(t)= \int_{0}^{t} \lambda du$\\ $\Lambda(t) = \lambda t$\\ \textbf{THE WEIBULL DISTRIBUTION} \\ \textit{(WITH TWO PARAMETERS)}\\ Let $ \lambda $ be the scale parameter\\ and $k$ be th shape parameter\\ $ S(t)= e^{- \lambda t^{k}} $\\ $f(t)=\frac{-d}{dt}S(t)= k \lambda t^{k-1}e^{- \lambda t^{k}}$\\ $ \lambda (t) = k \lambda t^{k-1} $\\ $\Lambda (t)\int_{0}^{t} \lambda (u)du = \lambda t^{k} $\\ The weibull distribution is convenient because of its simple form. it includes several hazard shapes:\\ $k=1$ for constant hazard\\ $0<k<1$ for decreasing hazard\\ $k>1$ for increasing hazard\\ \textbf{GAMMA DISTRIBUTION}\\ The gamma distribution with parameters $\lambda $ and $k$, denoted $\Gamma (\lambda, k)$, has density\\ $f(t) =\frac{\lambda(\lambda t)^{k-1}e^{-\lambda t}}{\Gamma(k)} $\\ and survivor function \\ $ S(t) = 1-I_{k}( \lambda t) $,\\ where $I_{k}(x)= \int_{0}^{x} \lambda^{k-1}e^{-x}dx/ \Gamma(k) $\\ there is no closed-form expression for the survival function, but there are excellent algorithms for its computation. thee is no explicit formula for the hazard either, but this may be computed easily as the ratio of the density to the survival function, $\lambda(t)= f(t)/S(t)$.\\ the gamma hazard increase monotonically if $k>1$, from a value of 0 at the origin to a maximum of $ \lambda,$\\ It is constant for $k=1$ and decrease monotonically if $ k<1 $, from $\infty$ at the origin to an asymptotic value of $\lambda$.\\ If $k=1$, the gamma reduces to an exponential distribution, which can be described as the waiting time to one hit in a Poisson process. \\ \underline{\textbf{NON-PARAMETRIC ESTIMATION}}\\ \textbf{KAPLAN- MEIER ESTIMATE.}\\ Suppose $a_{t}<t \leq a_{k+1}$. Then\\ $ S(t)= P(T \geq a_{k+1}) $\\ $ S(t)= P(T \geq a_{1}, T\geq a_{2}, ...,T\geq a_{k+1} $\\ $ S(t)= P(T \geq a_{1})x \Pi_{j=1}^{k} P(T \geq a_{j+1}\mid T\geq a_{j}) $\\ $ S(t)= \Pi_{j=1}^{k} [1- P(T= a_{j}\mid T\geq a_{j})] $\\ $ S(t)=\Pi_{j=1}^{k} [1-\lambda_{j}] $\\ so $ \hat{S}(t) \cong \Pi_{j=1}^{k} (1- \frac{d_{j}}{r_{j}}) $\\ $ \hat{S}(t) =\Pi_{j:a_{j}<t} (1- \frac{d_{j}}{r_{j}}) $\\ $ d_{j}$ is the number of deaths at $a_{j}$\\ $r_{j}$ is the number at risk at $a_{j}$\\ Where $\ast \tau_{1},\tau _{2},...,\tau_{k}$ is the set of K distinct event times observed in the sample \\ $\ast d_{j}$ is the number of events at $\tau_{j}$\\ $\ast r_{j}$ is the number of individuals ''at risk'' right before the $j-th$ event time (individuals experiencing the event or censored at or after that time).\\ $\ast c_{j}$ is the number of censored observations between the $j-th$ and (j+1) event times. censored tied at event times are included in $c_{j}$.\\ \textbf{Greenwood's formula}\\ If $ \hat{\lambda}_{j}=\frac{d_{j}}{r_{j}} $\\ then $ \hat{S}(t) =\Pi_{j:a_{j}<t} (1- \hat{\lambda}) $\\ Now, instead of dealing with $ \hat{S}(t)$ directly, we will look at the log of it\\ $ log[\hat{S} (t)] = \sum_{j:\tau j<t } log(1-\hat{\lambda}) $\\ Thus, by approximate independence of the $\hat{\lambda}_{j}'s,$\\ $ var(log[\hat{S} (t)])= \sum_{j:\tau j<t } var[log(1-\hat{\lambda}_{j}) ] $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } (\frac{1}{1-\hat{\lambda}_{j}}) var(\hat{\lambda_{j}}) $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } (\frac{1}{1-\hat{\lambda}_{j}})^{2} var(\hat{\lambda}_{j}) $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } \frac{\hat{\lambda}}{(1-\hat{\lambda}_{j})r_{j}} $\\ $ var(log[\hat{S}(t)])= \sum_{j:\tau j<t } \frac{d_{j}}{(r_{j}- d_{j})r_{j}} $\\ Now, $\hat{S}(t)= exp[log[\hat{S}(t)]]$.\\ $ var(\hat{S}(t)) =[\hat{S}(t)]^{2} var [log[\hat{S}(t)]] $\\ \textbf{hence the Greenwood's formula:}\\ $ var(\hat{S}(t)) =[\hat{S}(t)]^{2} \sum_{j:\tau j<t } \frac{d_{j}}{(r_{j}- d_{j})r_{j}} $\\ \underline{\textbf{COX PROPORTIONAL HAZARD MODEL}}\\ A Cox proportional hazard model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model provides an estimate of the treatment effect on survival after adjustment for other explanatory variables. It allows us to estimate the hazard (or risk) of death, or other event of interest, for individuals, given their prognostic variables. \\ Interpreting a Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard for patient having a high positive value on that particular variable is high. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable. Cox's method does not assume any particular distribution for the survival times, but it rather assumes that the effects of the different variables on survival are constant over time and are additive in a particular scale. The Cox model is a semi parametric model. It • makes no assumptions about the form of $ h (t )$ (nonparametric part of model) • assumes parametric form for the effect of the predictors on the hazard \\ The hazard function is the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval. It can therefore be interpreted as the risk of dying at time t. \\ \textbf{THE LIKELIHOOD}\\ Let $Yi$ denote the observed time (either censoring time or event time) for subject $i$, and let $Ci $ be the indicator that the time corresponds to an event (i.e. if $Ci = 1 $ the event occurred and if $ Ci = 0 $ the time is a censoring time). \\ The hazard function for the Cox proportional hazard model has the form: $$ \lambda(t\|X)=\lambda_{0} (t)exp(\beta_{1} X_{1} + ...+ \beta_{p}X_{p}) $$\\ $$ =\lambda_{0} (t)exp(X\beta^{^{'}}) $$ where $X=(x_{1},..., x_{p}) $are the explainatory/predictor variables.\\ $\lambda_{0}(t)$ is called the baseline hazards.\\ This expression gives the hazard at time t for an individual with covariate vector (explanatory variables) $X $. Based on this hazard function $$ L_{i}=\{ \frac{exp(\beta x_{i})}{exp(\beta x_{i}) +exp(\beta x_{i+1}) + ...+ exp(\beta x_{n})}\}^{\delta_{i}} $$\\ \underline{\textbf{The partial likelihood}}\\ The hazards included in the denominator are only those individuals who are at risk at the ith event (or censoring) time. The entire likelihood function can be expressed very concisely as $$ PL= \Pi_{i=1}^{n}\:\:( \frac{exp{\beta x_{i}}}{ \sum_{j=1}^n\:\:exp{\beta x_{j}} })^{\delta_{i}} $$\\ where $Y_{ij}=1$ if $t_{j}\geq t_{i},$ and $Yij=0$ if $ t_{j}<t_{i}$\\ \textbf{ The log partial likelihood }\\ This is given by\\ $$ l(\beta)= \log L(\beta)=\: \sum_{Y_{j}\geq Y_{i}} {X_{i}\beta -\log {\sum_{Y_{j}\geq Y_{i}} \:exp(X_{j}\beta) }} $$ \end{document}
1 2 3 4 5 Category: My files \| Added by: SCRIPTURE
Views: 270 \| Downloads: 17 \| Rating: 0.0/0