I’m currently working on a project that is based on pesticide exposures and we believe looking at the mixture of exposures based upon a Bayesian models is the right choice to meet our data without just making blanket assumptions. I’ve gotten where I understand the math behind both BHM and BKMR well and both can be applied reasonably however I am having trouble finding code I can work with to get started. I’ve looked through GitHub but most of it just becomes a confusing mess and I don’t want to operate under the assumption they are right without being able to understand it myself. Do you have a good source for me to start looking and working from?
biostatisticians continue to teach, and users of biostatistical methods continue to internalise, the idea that regression models provide an all-purpose toolkit that can be implemented more or less agnostically to the actual purpose. A widespread approach is first to “find the best model for the data” and second to develop an appropriate interpretation of the fitted model.
These practices reflect what we describe as the “true model myth”: the notion that the statistical analyst’s primary task is to identify a model that best describes the variation in an outcome in terms of a list of “independent variables”. Finding the best model is rapidly conflated with the idea that the identified model provides a useful approximation to the actual data generating process – from which empirical conclusions can then be drawn.
Carlin, John B., and Margarita Moreno-Betancur. “On the uses and abuses of regression models: a call for reform of statistical practice and teaching.” arXiv preprint arXiv:2309.06668 (2023).
we examined 57 papers (18-20 per journal), in 36 (63%) of which regression methods were used. Among these papers, 25 (69%, or 44% of all papers) exhibited a type of misuse of regression along the lines that we have identified above (see Supplementary Material for details)… The most commonly observed problem was the fitting of multivariable regression models without full consideration of the precise aims of the study, in a manner that exemplifies the “true model myth”. Specifically, we found 10 instances of multiple regression applied to ill-posed questions along the lines of “can we identify the [most important] risk factors for [condition Y]?”. Furthermore, even when a clear research question was identified, we observed frequent misuse of regression, such as inadequately justified “adjustment for covariates” and erroneous interpretation of estimated coefficients.
| descriptive | predictive | causal |
|---|---|---|
| what are common combinations of pesticide exposures? | can we predict asthma from pesticide levels? | what pesticides should we advocate against on health grounds? |
Leek, Jeffery T., and Roger D. Peng. “What is the question? Mistaking the type of question being considered is the most common error in data analysis.” Science 347.6228 (2015): 1314-1315.
Hernán, Miguel A., John Hsu, and Brian Healy. “A second chance to get causal inference right: a classification of data science tasks.” Chance 32.1 (2019): 42-49.
Description is using data to provide a quantitative summary of certain features of the world. Descriptive tasks include, for example, computing the proportion of individuals with diabetes in a large healthcare database and representing social networks in a community.
The analytics employed for description range from elementary calculations (a mean or a proportion) to fancy clustering or sample size calculations
Prediction is using data to map some features of the world (the inputs) to other features of the world (the outputs). Prediction often starts with simple tasks (quantifying the association between albumin levels at admission and death within one week among patients in the intensive care unit) and then progresses to more complex ones (using hundreds of variables measured at admission to predict which patients are more likely to die within one week).
The analytics employed for prediction range from elementary calculations (a correlation coefficient or a risk difference) to sophisticated pattern recognition methods and supervised learning algorithms that can be used as classifiers (random forests, neural networks) or predict the joint distribution of multiple variables…
Counterfactual prediction is using data to predict certain features of the world as if the world had been different, as is required in causal inference applications. An example of causal inference is the estimation of the mortality rate that would have been observed if all individuals in a study population had received screening for colorectal cancer vs. if they had not received screening.
The analytics employed for causal inference range from elementary calculations in randomized experiments with no loss to follow-up and perfect adherence (the difference in mortality rates between the screened and the unscreened) to complex implementations of g-methods in observational studies with treatment-confounder feedback (the plug-in g-formula).
| Question | Assumed Question | Actual Question |
|---|---|---|
| “I want to predict who will benefit from a program” | predictive | causal |
| “I want to know how accurate this published risk score is” | predictive | descriptive |
| “I want to understand associations between my exposure and outcome” | descriptive | causal |
“Predictive models always help decision-making”
Lipton, Zachary Chase. “The mythos of model interpretability. CoRR abs/1606.03490 (2016).” arXiv preprint arXiv:1606.03490 2 (2016).
Intussusception is an acute bowel constriction that occurs infrequently in very young children. It is painful and can be dangerous because it blocks the intestines, so surgeons are called upon to intervene and relieve the blockage. The standard treatment (at the time of the research described here) was to use a “gas enema”, a simple procedure that injects air into the baby’s rectum. The procedure is usually successful, but not always, so the clinical investigators of this study were aiming to understand the extent to which a successful outcome could be predicted using characteristics of the child or their clinical presentation.
Reserachers have data derived from a cohort study of all children born in 1961 who were attending school in Tasmania in 1968 (aged 7 years), with this paper reporting a cross-sectional analysis of data collected from the parents at the time of recruitment (n = 8585). Questionnaires were used to determine both the primary outcome of interest (history of asthma in the child) and the risk factors, including child’s sex, other atopic conditions (such as hay fever and eczema), family history of allergic disease and parental smoking. The stated aim was to examine the strength of association of these risk factors with childhood asthma.
Young children who acquire a urinary tract infection may also develop an infection of the kidney known as pyelonephritis. Affected kidneys become enlarged during these infections, which makes it difficult to use ultrasonic measurements of kidney size as a reliable baseline for future assessment of growth. The research described here sought to estimate how much affected kidneys were enlarged, compared to normal kidneys.
I’m trying to make a predictive model to help me diagnose a disease in an African population. I have a random forest model trained from US data and a model trained from US + European data and I want to compare the two. But the problem is that I only have two data points to compare. I’m thinking of training more models like lasso so I have more data points to compare and I can get results I can do statistics with. What do you think?
What else?
A mathematician, an engineer, and a statistician apply for a job…
A client has done an experiment where they gave students in one 20-person math classroom access to a new and improved textbook while the 20 students in the other classroom had to use the old book. They compared test scores between the classes after the course using a simple rank-sum test and it does seem like the students with the new textbook did better, but the result is only barely statistically significant. They come to you to see if they could do something better.
I’m doing a project about testing for COVID-19 with dogs who are trained to detect covid by smell. I want to run an experiment that will compare the sensitivity of the dogs to the sensitivity of a traditional antigen test. How should I determine the sample size?
| FISHER | NEYMAN |
|---|---|
| Set up a statistical null hypothesis. Report the exact level of significance (e.g., p = 0.051 or p = 0.049). Do not use a conventional 5% level, and do not talk about accepting or rejecting hypotheses. | Set up two statistical hypotheses, H1 and H2, and decide about α before the experiment based on subjective cost-benefit considerations. If the data falls into the rejection region of H1, accept H2; otherwise accept H1. Note that accepting a hypothesis does not mean that you believe in it, but only that you act as if it were true. Only report what was accepted or rejected. |
| More useful when there are no formal tradeoffs or meta-structure | More useful when there is a clear cost/benefit tradeoff and a governing body |