false
Catalog
Risk Prediction and How the HAS-BLED/CHA2DS2-VASC ...
Session
Session
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, everyone. We're going to go ahead and get started. My name is Julie Maubane and I am the product manager for the L.A. Registry and I will be your moderator for today. So we have Dr. Paul Barosi here with us to speak and we are, he is the director of EP and the professor of medicine at the V.A. Eastern Colorado Healthcare System at the University of Colorado. And Dr. Barosi has been with the L.A. Registry for many years from the beginning. He has been our chair on our steering committee, so we are very happy to have him here to speak to us today. So I'm going to pass it on to Dr. Barosi. Thank you. Thank you so much, Julie. It's so great to be here back in person with all of you. I missed these meetings for the last couple of years and virtual just is not the same. Here we go. All right. The mouse is not, the button on this is not working. Oh my God. Like I said. Okay. But I think I can do it the old-fashioned way. I have another one. All right. So we'll move ahead. So I'm here to talk a little bit about major adverse event prediction in the L.A. Registry and how the elements of both the HASBLED and the CHADS-BASC score interconnect with risk score and risk modeling. Over the next, probably this is going to take us I'm guessing 25 to 30 minutes and we'll have some time for question and answer after. I'm going to talk about the left atrial appendage occlusion registry risk model work group. This is actually a carefully coordinated dance that really requires a lot of people with different areas of expertise and is not just a matter of throwing a bunch of variables into a model and calling it a risk model. We'll talk a little bit about the approach overall of risk modeling and what that entails and how that's different than say an observational study that might be published using a similar statistical model to examine a risk factor and its association with a particular outcome adjusted for various confounding factors. And then we'll talk about very specifically about how those come together, how the risk model work group and an understanding of the process of risk modeling have come together with the work that Jim Freeman, Emily Ong, and Sarah Zimmerman have really done a phenomenal job doing. I am merely the messenger today. Now the risk model work group is really a multi-stakeholder team with a lot of stuff going on. And that team is led by Jim Freeman who can't be here today because of another commitment and so I'm pinch hitting for him. And actually I'm not even the second choice speaker for today because Emily Ong is one of Jim Freeman's junior colleagues at Yale and she has done a phenomenal job under Jim's leadership of helping to support the process through this. And Sarah Zimmerman at the Yale core team is the primary statistical analyst who's actually implemented the work that the group has come up with as a team. The large team including members of the group from the Yale Center for Outcomes Research and Effectiveness there. Jeff the Curtis who of course is at Yale and also the chief science officer of NCDR has been involved and he has successfully led a number of risk prediction modeling processes in a number of different registries including the cath PCI registry, the registry, the EP device registry formerly known as the ICD registry. There are many people from the from NCDR staff including from the science team as well as Julie, Christina Kutras, Fernando who I see it's sitting in front of front row here and then a large group of physician volunteers of which I'm just one of them. So it really is a team. Now this may be unfamiliar territory for some but it's important to kind of have some baseline because the rest of the talk doesn't make sense if if you don't have some understanding of the differences between what is a an observational study and the process of risk modeling. So in an observational study where we're looking at some risk factor let's say trying to figure out if obstructive sleep apnea is associated with increased risk of atrial fibrillation in a large group of patients you might implement that with a large multivariable statistical model such as a logistic regression or Cox proportional hazards model in order to ensure that you're not dealing with the possibility of confounding by other potential variables that may be associated with both the predictor and the outcome. You're going to include a bunch of factors to put into factors to put into the model to adjust for those to make sure that you're dealing with an independent association. And the purpose of that kind of study really is to try to make sense of things as getting the idea that this is associated this factor is associated with this outcome. Yes sleep apnea is associated with an increased risk of atrial fibrillation in a way that informs our understanding of the science of what the causal pathways and those sorts of things are to move forward with other sorts of work. Risk modeling is a process that superficially looks very similar in the sense that you have a large data set you're working on the process of understanding risk using a large multivariable model but the purpose is quite different and it's twofold. It's one to try to use a model to identify a gradient of risk of that outcome based on factors that you're putting into the model and then taking the what you've gleaned from the model in this cohort to try to apply it to understanding and estimating risk in other patients beyond that. Now that may sound very similar but there are some very real things that are tricky about that. Now you may be familiar with the Chad's VASC score and there are a number of very very very good both mobile app and online calculators that you can use and say for my patient from my clinic from last week I actually used this information to calculate his risk. So what I said was he is 67 years old so that green box in the middle there is checked under age so he gets one point for age being between 65 and 74. He's male he does not have congestive heart failure he has a diagnosis of hypertension but no prior stroke no vascular disease and he has diabetes. In that case that gives him a Chad's VASC score of three and you can see down here below it says that the that gives him an estimate of a 3.2 percent per year of stroke based on the Swedish atrial fibrillation cohort. That is essentially what risk modeling is done is useful for. So in order to do this though there are some challenges that need to be overcome and we need to think pretty carefully about that. The first one is generalizability of the data that you're using beyond that cohort. There is a very real possibility of coming up with some understanding based on the data that you have but if those patients are not representative or if there's something quirky and unusual about that particular group of patients you may be fooled into thinking that what you found in this small group of people is applicable in the real world and we actually have to look at that formally. And this is where the processes of so-called development set validation set modeling some sometimes called derivation set and validation set modeling and methods for things that you'll hear about in just a moment like bootstrapping that allow us to use ways of validating what we're seeing. Two concepts that I think also are relevant in the process of performing risk modeling are there's a concept of model discrimination which in the statistical sense the word discrimination is the concept of how well a model can stratify patients into low risk versus high risk within the individual cohort. If you have clear gradients of risk that we would say that is characterized as having good discrimination and statistically we measure this by the use of the c-statistic which effectively is area the area under the receiver operator characteristic curve which or ROC curve which is something you may hear. Calibration is a separate concept which is the idea of trying to get the sense of how well the predicted risk in the model correlates with the actual observed risk in the real world. So it's possible you might have a gradient of risk high risk and low risk patients but but they're not the same as the risk in the real world and we'll see this. Some other terms you may hear about you'll may hear about this concept of variable selection strategy. So when you've got a model large statistical model with lots of factors in it how do you decide how variables get put into the model? One way is a forward selection strategy where you start with an empty model and you put in individual variables based on them being looking like there is an association between that risk factor and the outcome when it's looked at by itself and then you stick them into the model that way. The backward selection strategy is where you throw everything but the kitchen sink every possible data element that you have into the model and then you remove things that do not appear to be statistically relevant. And then important for what we're going to what we'll I'll show you there is also there are also times where we do what is called forced variable selection when there is a strong biological plausibility or that there are known factors that we are believed to be relevant or that from a face validity standpoint that we really think no matter what the the model says we need to include these factors in the model that's called forcing those variables in. The concept of multicollinearity is a statistical term that and I'm I'm just an electrophysiologist so forgive me when I'm treading into thin ice of talking about statistics but multicollinearity is that phenomenon where you've got two variables that are kind of measuring similar sorts of things and when you have two variables that you're putting into a model that are too similar to each other there is a danger that they could their effects in the model could effectively cancel each other out giving the illusion that they're not relevant. What do I mean? Let's say for example you put both the actual blood pressure a patient has and a diagnosis of hypertension in the model or diagnosis of hypertension on the past medical history section and the hypertension from the Chad's Vasque score and put those both in there at the same time there is a danger there of of running into multicollinearity effects that could lead you to believe erroneously that hypertension is not associated with risk of whatever outcome you're looking at so that's needs to be considered. So that in mind what did we do with the work group? Our charge really was to develop a risk model to examine some way of predicting risk of adverse major adverse events or MAE in the LAO registry and this really is the the primary outcome definition in detail that includes all of those things that we would count as a major adverse event. This includes death, cardiac arrest, myocardial infarction, pericardial effusion requiring intervention, systemic arterial embolism, device embolization, hemorrhagic stroke, ischemic stroke or undetermined stroke, transient ischemic attack, intracranial hemorrhage, major bleeding or major vascular complication and all of these occurring in the hospital at the time of the index procedure. So we also had inclusion and exclusion criteria so thinking about what things what patients should be included in that process such as and we decided to include all patients older than at least 18 years of age who were enrolled in the registry and underwent a procedure and just because the direction since a couple of years ago is all Watchman Flex we didn't feel like it made a whole lot of sense in the initial modeling approaches to be modeling data from the original Watchman 2.5 device which was associated with substantially higher risk complications. What's much more relevant to us is what the risk moving forward with the Watchman Flex device. As far as exclusion criteria we excluded patients whose procedures were canceled prior to to vascular access. All right. So here's the cohort that we actually had starting with at the time that we started the analyses starting with an initial total number of procedures of almost 134,000 procedures then taking out those patients for whom it was not their first procedure and then taking out those patients who were not Watchman Flex devices. At the end of all of that we were left with a little over 41,001 patients. Putting this into a different flow a flow diagram from in the consort model we looked at in the range of quarter three 2020 to quarter three 2021 and this is the same number you can see that where we had the exclusions that we pulled out of there and at the end of that among those 40,000 41,000 patients we divided the cohort up into roughly 70% as the development set and 30% of that cohort almost 12,500 patients for the validation set. And this process is important because it allows to you to look at both discrimination and calibration those two concepts that I was saying and we'll look at this in just a moment. All right. So what variables did we actually consider? And it's really the kitchen sink. We included lots of stuff. We included age, assigned gender at birth, the congestive heart failure based on the Chad's VASC score, New York Heart Association class, evidence of left ventricular dysfunction, the Chad's VASC version of hypertension, diabetes, stroke, TIA, thromboembolism, known vascular disease, poorly controlled hypertension from the HASBLED score, but again there's the potential for multicollinearity that we know we're going to have to anticipate, HASBLED abnormal liver function, labile INR, alcohol use, antiplatelet medication use and NSAID use all from the HASBLED score, clinically relevant prior bleeding, increased fall risk, the various cardiomyopathies and their subtypes, comorbidities such as coronary artery disease and chronic lung disease, subtypes of atrial fibrillation, paroxysmal persistent, long-standing persistent and permanent, prior attempts at atrial fibrillation termination in the various subgroups, the presence of valvular atrial fibrillation, rheumatic valve disease, mitral valve replacement, mitral valve repair, atrial flutter, typical and atypical, prior attempts at termination of atrial flutter, and then physical exam findings such as body mass index, systolic and diastolic blood pressure, again with a risk of potential multicollinearity that could foul up the model if we're not careful, hemoglobin, creatinine, platelet count, and then there were several variables that we excluded in the process. We did remove race from the model, and this was felt not to be clinically meaningful, and this is in line with other modeling processes that have been successfully performed in other NCDR registries. We removed insurance payer, thinking that this was also not likely to be meaningful in terms of the determinants of procedural outcomes. We removed several items such as prior stroke from the HASBLED score because it was already included from the CHADS-VASC score, again an issue of multicollinearity that we knew that that might be associated with risk, but we didn't want to foul things up by having those in there. Two different ways. HASBLED abnormal renal function, again because we're going to have the GFR variable as part of the consideration, and similar thing through HASBLED prior bleeding. We did not include a number of variables because we either didn't have clinical significance or not enough data because of high degree of missingness. Things like the modified Rankin score was a marker of frailty, but already represented because of the increased fall risk. And then a number of others. So several reasons. We had to think about this very carefully. And this was work for the entire group over several Zoom meetings or WebEx meetings to discuss specifics of going through each one of these and making an argument for how do we do this? How do we avoid multicollinearity? How can we be absolutely sure that we're not going to be fouling up the model by including this? So this negotiation process of what we keep and what we don't was really something that took a lot of thought from the whole team that you saw. All right. Now there were, just as there were some variables that we included, some that we specifically excluded, there were also some variables that we were very deliberate about forcing into the model to ensure that we were going to be getting results that would be clinically and scientifically valid. And then based on the results of the polling that we all did, we decided that the following variables were going to be forced into the variable, into the model, because they were going to be scientifically or clinically relevant. We thought age, female sex, which had previously been clearly to be associated with worse outcomes, not only with LAO, but with several other cardiovascular procedures as well. We decided that the factors in the CHADS-VASc score were relevant enough in general that that was the way we were going to go for capturing these when there was a possibility of capturing this by a number of different means. Clinically relevant prior bleeding, GFR and body mass index. Bootstrapping analysis. So the concept of bootstrapping, which I did mention a little bit earlier, bootstrapping is a statistical strategy that estimates the proportion of times that a candidate variable would be selected into the model if you took a different random sample of patients and do that over and over again. So imagine, so we talked about the fact that we have 41,000 patients that we're dividing up into two thirds of that, 70% as the development set and then 30% as the validation set. Well, what if you take that same 41,000 patients and randomly select that 70% out of the group for the development set a thousand times and do this whole process of variable selection each of those thousand times. So then that process, then at the end of the day, you can say for each variable, what is the probability that this variable is going to be selected to stay in the model? And this is what we found. If we look at that model, looking at the FLEX, Washington FLEX only, and say, doing that thousand times around bootstrapping process, which variables had a greater than 50% chance of staying in the model. These were the factors that we kept. The variables that are in green at the bottom are those that we forced into the model that we said were so scientifically relevant that we had to keep them. The variables in yellow at the top were the ones that had a high enough probability to stay in the model, and that included hemoglobin. So in other words, if patients were severely anemic, their risk of complications was pretty substantial. Women, female sex has a higher risk of complications, probably likely related to, in general, smaller size of heart compared to many men. Age, older patients having a greater risk, and prior attempts at atrial fibrillation ablation. This is probably just a marker of increased severity of disease overall. And you remember I mentioned that the measure of discrimination, how well you capture gradients of risk, is measured by the C statistic. And for this model, we had a C statistic of 0.67. You can see right here. And I would tell you that that's not great. Evaluating a C statistic, perfect discrimination. You can actually, with the variables that you have, you can tell absolutely those patients at low risk from those at high risk. If you have perfect discrimination, your C statistic should be 1.0. If you have essentially no discrimination above the model whatsoever, that value is 0.5. So this model didn't do so great at 0.67. That said, the Framingham heart model, how many of you guys have used the Framingham heart calculators to figure out 10 and 20 or CHD risk? Those models have C statistics on the order of 0.7 to 0.75. So they're not perfect either, but there is value in them, even when the discrimination is less than perfect. All right. So the second model, we said, well, what if we lower the threshold to stay into the model to greater than 30% likelihood of staying in the model? We essentially got the same things, but then there were a couple of other variables that may have stayed in the model kind of on the margins of things. And really, the C statistic was not terribly different at 0.67. So that's not terribly different at 0.676. When we actually look at that quantitatively and say, how did the model perform beyond just the C statistic, here's what we found. When we actually measure the calibration formally, and those gamma 0 and gamma 1, my simplistic understanding of these is that gamma 0 essentially is modeling the intercept. And that should be, if you have perfect calibration, the value of that should be about 0. So at minus 0.83, that's not great. And the gamma 1, that essentially is the slope of estimating how well the predicted and the observed match up. That should be about 1 if it's perfect. And at 0.79, that's less than perfect. The receiver-operated characteristic curve, the C statistic, that's the 0.67 that I just showed you. And if we look at these with curves, this actually is a graphical way for most non-statisticians, I think, to make a better sense of discrimination and calibration. So in the development cohort on the left, what you see is those patients with low predicted risk had actual low risk on the y-axis, where that's the major adverse event rate. And then as their predicted risk increases, their actual risk also increases. Not surprisingly, the observed and predicted match up pretty well in the development cohort. If you look to the right, you'll see that the yellow predicted risk and the blue actual observed risk in that validation cohort don't quite match up as well. So this is less than perfect discrimination and calibration, which is why we do this. So we said we have issues with our model. So the lower than ideal value of the C statistic, which we confirmed both numerically and then could see graphically that there was less than ideal discrimination, the model calibration was not as good as we would have liked it to be. So then what are our options? One option is for us to say, well, WatchmanFlex at this point had just been approved. Let's wait a little longer and get more data and see what else we can do. Second, let's consider expanding the analysis to look at the entire Watchman cohort and see if we can find something useful with data that we actually have before repeating everything down the road. We can just keep it as is and say we're going to go with what we have. And then the other thing that we considered was to change the outcome in question, which would be a little bit less heterogeneous in terms of what sorts of things we're seeing. We actually decided not to change the outcome because that major adverse event, even if there are heterogeneous things in there, such as major bleeding things, perforation, tamponade, stroke, those are all things that matter, even if they're very different things that might have individually different risk factors. So we thought it was important to keep that. And ultimately, we decided to expand and look at the entire Watchman cohort to see if we could learn anything from a wider data set. In the all Watchman model, essentially, we looked at all of the procedures and not just FLEX. So we included the Watchman 2.5. In this bootstrapping analysis, we found several other variables that stuck in the model. And we found prior atrial fibrillation terminate is dropped out, but new antiplatelet drug is included. The other thing that I think is interesting as well, now that you've got two different devices, there's the potential for what's called effect modification, or statistically, they call that an interaction. And what this actually means is in looking at the association of hemoglobin with risk of the adverse event, or of being female and the risk of an adverse event, those associations were modified. In other words, what kind of device you got changed that level of association. That concept is called statistical interaction. Specifically, with female sex and the FLEX device, we found that women in this combined model have an increased risk regardless of which device you get. But that risk is 33% lower with the FLEX device than it was with the, compared to the 2.5, Watchman 2.5 device. And that gives a female sex times FLEX interaction term with an odds ratio of 0.77, or about a 30% lower risk. In the FLEX only cohort, the odds ratio among women was 1.44 versus 1.8, if you got a FLEX device. I'm sorry, if you got the 2.5 device. So then in looking at this all Watchman model that included the interaction terms, we got largely the same findings, but the two interaction terms did stay in the model. And we really didn't do a whole lot better with the STI statistic. It's still about 0.67. When we used the all Watchman model, but then applied it back only to the FLEX only cohort, it was still about the same, still about 0.67. When we actually look at the numbers, however, it does look like things did a little bit better. So if you look at the numbers on the, for calibration, our gamma zero is a little bit less negative. So we're a little bit closer to zero. And our gamma one went a little bit closer to one. So a little bit better, but still not great calibration. And our ROC is still less than ideal at about 0.66. If we look at this graphically, you can see that in the validation cohort, the observed data are a little bit closer to matching up with the predicted data, but still not perfect calibration or discrimination. So at the end of the day, there really is not a lot of difference between the two models. So at the end of the day, there really are still issues with the model. We still have a less than ideal C-statistic. The model calibration, both graphically and numerically, is a little bit better, but still not ideal. And what are the outcome, what are the options that we have? Well, we really still have the same outcomes, but I think we're going to continue with what we have for now. So really, over the last 30 minutes or so, we've talked through several things, including why this has to be a team sport. And I talked about the large work group that we have to try to navigate this challenging work of risk modeling. I gave you work of risk modeling. I gave you a little bit of overview of the process of risk modeling and why it's such a challenging thing to get this right. And then we talked about what we actually tackled in the work group, including defining our cohort, what outcomes we're working with, how we selected the variables that we put into the model, the process of bootstrapping variables we forced into the models, and the approaches we took with both flex-only and Watchman, all Watchman models. And so, last thing before we open it up for questions, I just want to come back and loop back and remind you again that this is not something that is simple or can be easily done without the right kind of people and expertise, with both clinical expertise, statistical expertise, and with the leadership of folks from the NCDR team as well. So, it really is a challenge to get it right. And I can't say enough how Jim Freeman has done a phenomenal job with Emily and Sarah in leading the effort. And I'm pleased to be able to present this on their behalf. That'll say thank you. So, we do have a couple of questions. Great. As far as the variables, why was the anatomy not included, like the LV size and the LIA size? Well, we don't actually have great data on the actual dimensions from the registry, and those are excellent questions. And I suspect that if we had better definition of the morphology of the left atrial appendage and the various sizes and what device selections we had, we might, relative to those, we might have been able to say more. I think there's probably more here. And with future updates of the registry, it's possible that we may be able to do better in that regard. The next question, there are many adverse events collected in the registry. How was it decided which events would be included? Good question. And this was largely in alignment with the FDA definitions for what they considered major adverse events in the Protect AF and Prevail trials. So, we were trying our best to be concordant with what was believed to be relevant from a regulatory perspective. Any discussion on including Watchman Flex versus the amulet? That's a great question. And at the point that we were starting on these analyses, the amulet had literally just been approved a matter of a few weeks beforehand. I'm confident that over time, as the model evolves, that we will be including amulet devices in there, but we weren't there at the point that we were performing these analyses. Will this model take into account hospital patient volume? Yes, it does. And I didn't include the data on risk standardized incidence rate, in part because it was a bit challenging to try to present that in a way that was not going to put me to sleep. And I worried about trying to explain that to you all as well. But yes, there were considerations for hospital volume in the model as well. And then I think just one more question. This is a general question. But before, let me say one more thing before we take that. But since the focus really is in trying to calculate risk at the patient level, the hospital level risk is very much interesting from a quality standpoint. But for an individual patient, that's a bit harder to take to an individual patient for that calculation. And then again, I think this is, yeah, this is the last one. It's a general question about why TIA is not included in the HasBlood score. That's just, I don't know if you're doing that. That's just a function of the way that the definitions for TIA are included in the HasBlood score. It's a function of the way that the definitions for these various all written. So both the CHA2DS2-VASc and the HasBlood scores are already derived and validated processes, and we couldn't really change them. That's just the way that they were constructed. I think that's it for questions. Awesome. Well, thank you all. I know to some people talking about risk modeling is kind of like watching paint dry. But what I'll say is that in terms of really trying to glean information from the data and doing the really hard work of trying to get this right, that attention to detail is critical and I think will bring value when this risk model starts to appear in the dashboard for participants. Thank you.
Video Summary
In this video, Dr. Paul Barosi discusses major adverse event prediction in the L.A. Registry and how it is interconnected with risk scores and modeling. He explains the process of risk modeling, which is different from observational studies, and emphasizes the importance of careful consideration and collaboration in creating a risk model. Dr. Barosi presents the variables included in the risk model, such as age, sex, medical history, and physical exam findings. He also discusses the challenges of risk modeling, including the issue of multicollinearity and the need for model discrimination and calibration. The performance of the risk model is evaluated using the C statistic, calibration plots, and other metrics. The model shows less than ideal discrimination and calibration, indicating room for improvement. Dr. Barosi highlights that risk modeling is a complex task that requires expertise from various stakeholders, and acknowledges the contributions of the team, particularly Jim Freeman, Emily Ong, and Sarah Zimmerman. He concludes by addressing questions from the audience regarding variables, adverse events, device selection, and hospital volume. Overall, the video provides insights into the process and challenges of developing a risk model for the L.A. Registry.
Keywords
major adverse event prediction
L.A. Registry
risk modeling
variables
multicollinearity
model discrimination
calibration
risk model performance
×
Please select your language
1
English