165 Matching Annotations
  1. Aug 2024
    1. By examining these questions, we aimed to contribute to the understanding of how preventative care can improve public health in the United States.

      I suspect you need to greatly expand on this section. You keep mentioning preventative care, but not of your points above obviously link to preventative care?

    2. Investigate predictability of chronic condition prevalence between public and private insurances.

      Does this map to your research question? It seems a bit separate from preventative healthcare measures?

    3. In the world today, there is a growing problem of increasingly poor health across the globe. While life expectancy has increased, the amount of years spent in poor health has also increased proportionally.¹

      This feels like the start of a introduction section. You can certainly combine the Intro and Background if you want.

    4. endemic

      Is this a term that everyone would be familiar with? Because I'm not sure I could define it off the top of my head. I could hazard a pretty decent guess based on other similar words and the usage here, but...

    5. In our reseach, we focused on just Medicaid in order to better serve our low-income and traditionally marginalized communities.

      That is totally ok, but I think you could better reflect that in your title / research question.

    6. Within the United States in particular, recent reseach has revealed not only that “the U.S. [has] the lowest life expectancy among high-income countries, but it also has the highest rates of avoidable deaths” despite spending almost 18% of its Gross Domestic Product (GDP) on healthcare.

      This feels like it absolutely needs a citation. You are even directly quoting.

    7. Since we can expect access to preventative care to be limited for those on public health insurance in the United States

      I'm not sure if I totally follow what you are saying here. By public do you mean social? And if so, shouldn't having the social health insurance give them better preventative care? It wouldn't be limited then?

    8. higher rates of chronic conditions

      What do you mean by chronic conditions. When I think of chronic conditions, I think of many things that an individual had no control over. And thus that the rate of the condition wouldn't necessary vary? But perhaps that the downstream affects WOULD vary depending on if they could get the treatment they needed to manage the condition?

    9. we analyzed only data from 2022 due to data quality concerns when comparing year to year.

      Could you not look for the same trends in the 2021 data though as well? To at least see if you see the same things? Or how bad are the data quality concerns?

    10. Variables we investigated from this dataset included percentage of population on Medicare, percentage of adults who smoke, healthcare expenditure, median annual household income, cervical cancer incidence rate, and many more potentially relevant factors.⁷

      This is just what data you used, so I'm not 100% sure what this citation would take me to. You already cited it above, so I don't think you need it here.

    11. We used text parsing to create a column called demographic and a column called group. This allowed us to parse out demographics like age, race/ethnicity, poverty level and create groups within each of those.

      You absolutely need to elaborate on this.

    12. Again, we used text parsing such as grepl and gsub to fill in our newly created demographic, measure, group, secondary_demographic, and secondary_group, columns.

      Well you gave a bit more detail here, but you still need more.

    13. Finally, we added a county column (which we filled with all NA) and edited the state and question fields to replicate the way that data from our CHFF dataset was formatted. This allowed us to merge the two datasets.

      Talking about all of this in the abstract is really difficult for a reader to follow, especially because they still have no idea what the actual columns in your data were. Some visuals would go a long way to making this more clear.

    14. 4.1

      While I think tabsets can be great, I'm not sure if they are a good fit here. But I also don't understand what exactly it is that you are trying to showcase or say here, or why some goals have some much more attached to them here.

    15. Figure 1: ERD.

      All of your types are still text here.

      Plus, you need a much more fleshed out caption on your figures.

      Finally, you need to reference the figure in the text.

      PS: the name of the measurement table doesn't seem to indicate what it is holding, which I think are types of units?

    16. Process outlines:

      You need to actually introduce this section with some text. In fact, you should pretty much always have text coming immediately after a section heading.

    17. 4.2 Statistical Thinking

      While these are the main sections that I am looking for as a grader, I do not think breaking things up into these sections makes the most sense to a reader. You don't need to specifically call these things out. I'll find them, don't worry. Instead, you should focus on telling your story.

    18. ensembles, logistic regression, random forest, naive bayes, PCA, KMeans, Gaussian Mixture Models

      Just off the top of my head, this feels potentially excessive. You should be able to narrow your focus down to just 1 or 2 that make the most sense for what you are trying to model.

    19. Those who are not using well-known forms of insurance, such as Direct Primary Care, or who are not seeking traditional medical care are not represented.

      This is good, but I think the only direct reference to your own data. I would seek to tie back more of what you are saying to what you specifically did or to what your data specifically utilized.

    20. What are these models and tools?

      I like the idea behind this reference, but I definitely wouldn't put them in their own subsections. And there is probably too much here, owing to the above comment.

    21. 5.1.1.1 Public Insurance

      I'd probably never go this deep with sections. Sectioning this amount is almost more confusing that helpful, because humans can't keep that many layers in their head.

    22. Medicaid-related covariates, such as how many adults did not see a doctor in the past twelve months due to cost, state Medicaid expenditure, state Medicaid spending, total Medicaid spending, and population percentage on Medicaid.

      Why the pivot from looking at medicaid percentage to all these medicaid-related covariates?

    23. While Medicaid-related factors may not be able to completely predict diabetes prevalence, it certainly is correlated.

      You don't actually SHOW anyone these results though. You just report them, so a reader has no idea how you did it, or what bias you might be bringing in your reporting of these values.

    24. There may be a slight linear correlation between mean diabetes percentage per state and Medicaid percentage, so we will see how a regression can predict diabetes prevalence.

      Much better caption here than in early figures. Though I'm not sure why this is Figure 5A.

    25. Private Insurance

      This absolutely should not be its own section, just another paragraph. Especially if you are trying to say these two figures are 2 parts of 1 whole, they can't be split across sections. And really, I'd put them both on the same figure if you are wanting to compare them this way.

    26. We made another logisitic regression model with just diabetes prevalance and percent private insurance per state with only 23% accuracy.

      Is this comparable to what you did earlier? Because it sounds different here.

    27. Though this is lower than the above-mentioned public insurance data, it is important to note that we are using only one predictor in this model- percent of privately insured individuals.

      So if your goal is to compare, why not compare them on equal playing fields?

    28. Our model accurately predicted CVD prevalence only 15% of the time

      The same model as earlier? 15% being explained by a small dataset when the above was 53% seems like a reach. This feels like a very different response.

      Also, I don't believe you made clear anywhere how much data you were actually working with. That should definitely be included in the data section.

    29. Turning once again to our privately insured individuals, a similar model accurately prediced CVD prevalence only 15% of the time.

      Ok, so that is the same. But you never mention your negative associate figure.

    30. list of variables representing access to care

      How extensive is this list? It would be nice to be able to show it or indicate more than just directing them to a CSV.

    31. Fig 2A: Median life expectancy of most states is between 77 and 80 years old.

      Reference the figure in the text before the figure appears.

      Also, though it might be obvious, you should indicate units on your life expectancy

    32. Again

      Don't use transition words like this in captions, since there is no guarantee that people will be looking at them in any particular order.

    33. lm(formula = `LIFE_EXPECTANCY` ~ `state_name` + `PCP_RATIO` + `%_VACCINATED_(HISPANIC)` + `AVERAGE_DAILY_PM2.5` + `%_FOOD_INSECURE` + `HOUSEHOLD_INCOME_(HISPANIC)_LN` + `SEGREGATION_INDEX` + `%_BROADBAND_ACCESS`, data = ds_life_adjusted)

      I think there is a better way to present this. Also, are these the only variable representing access to care? Or how did you arrive at these?

    34. We decided to keep state in our linear model because it did have a significant correlation with life expectancy.

      Is it just codependent with other more significant factors though? It is hard to say without having seen a list of the other factors.

    35. There does not appear to be a geographic influence on median life expectancy.

      At all? I find it interesting that Utah is the only seemingly green non-border state.

    36. Percentage of Food Insecurity and Segregation Index

      I don't believe these have been explained or defined, so I'm not sure what I should think of them.

    37. Percentage of Access to Broadband Internet was also positively correlated with life expectancy.

      This presumably tracks mostly with rural or urban though correct?

    38. To explore the curves and log rank test results, please explore our gallery below:

      This is fine for presenting the figures, but you still need to discuss them in the text. Otherwise why show them?

    39. Figure 2D shows life expectancy based on the percentage of food insecurity.

      It does? I see Time and probability of survival. What am I missing?

    40. Figure 2E: Log Rank Test Results for Life Expectancy and Percentage of Food Insecurity

      If you are going to show it, you need to talk about it more.

    41. Many of the same states that have longer life expectancies (Fig. 2B) as have lower infant mortality rates.

      If that is a main purpose of this image, I'd put them next to one another.

    42. As expected, those who are more food insecure have lower life expectancies.

      I don't understand how the probability and unitless time (maybe in years?) lead to what you are saying here. Probably because I haven't done much of this analysis, but you need to at least explain how someone should interpret the graph for it to be useful to show them.

    43. significantly correlated

      There are 20 numbers in this table. You should at least mention how you came to these conclusions from the shown data. Don't expect everyone to just instantly see the same thing you did.

    44. Finally, we repeated our machine learning process to predict age-adjusted death rate based on our significant access to care covariates. Our model accurately predicted the binned death rate for 62% of our test data. Just thirteen access-to-care-related covariates predicted mortality rate 62% of the time! This again emphasizes the importance of access to care in one’s overall health status.

      This is less a comment about this paragraph and more the previous section:

      This felt like a huge data analysis dump, and I was unable to keep most of it straight in my head and how exactly it was tied to your research question. There is a TON going on. At the end I have no clear indication what I should be taking away from it. I think you need to work on possibly trimming some of this away, and really fixating on a clear story, which you constantly relate back to.

    45. We knew that the following variables should be included in the creation of this new variable: Life Expectancy, Age-Adjusted Death Rate, Years of Potential Life Lost Rate, Percent of Frequent Physical Distress Days, and Percent of Frequent Mental Distress Days.

      Why did you know this?

    46. We chose the first three for the same reasons explained in the introduction to our results.³

      Yeah I'm going to be honest, that was ages ago and I have 0 recollection of those reasons at this point.

    47. We chose these five variables to summarize our overall health metric since they are all linearly correlated with each other.

      Ok, and why is this important?

    48. We then plotted the weights of all of our other variables against this new one to see how much each contributed to overall health.

      "All your other variables" as in the 5 above? Or a bunch of others?

    49. so we continued by projecting the variables above to this first eigenvalue, thus “creating” a new variable that represents overall health.

      Projecting each of the above variables onto the new basis would have given 5 projections right? So how is this a single variable? Or are you saying that this basis vector is your new variable, and you are working out the relative amounts each of the above contribute to it?

      And if so, it would have been nice to report the relative weights of these variables towards this eigenvector.

    50. Fig6B shows which features contribute the most in both a positive and negative direction to health.

      Good, so now comment on what they are and whether they make sense.

      County_name is on here. How does that categorical variable project onto a numeric vector?

    51. first eigenvalue

      This is PCA correct? Would that be the better term to use here? Eigenvalue is very much a math term, and if someone didn't know what you were doing here, it seems to me that they'd be very confused by what you were saying.

    52. broadband internet access and median household income showed significant negative correlations with infant mortality, indicating that better access to resources and economic stability

      Honest question: while I can easily see these as cofactors or as a proxy for access to resources and economic stability, I'm not sure that just access to resources and economic stability equates to preventative healthcare. Or at least I'm not sure that has been shown here?

    53. Adjustments for outliers are necessary to better understand the nuanced relationship between income and death rates within our linear models.

      Should I have noticed this in the middle of the analysis? Was it mentioned somewhere?

    54. Furthermore, our findings related to Diabetes (DM) and Cardiovascular Disease (CVD) reveal significant insights into the impact of Medicaid on chronic conditions. Our linear regression models demonstrated that Medicaid-related factors could predict diabetes prevalence with 38% accuracy and cardiovascular disease prevalence with 46% accuracy. These findings highlight the substantial role Medicaid plays in managing these chronic conditions and underscore the importance of access to healthcare for lower-income populations.

      This feels like an afterthought to the other analysis that was done at this point. Which I don't think it needs to feel that way, but the fact that it was discussed first and then mentioned again last here means it is really tough to remember exactly what was shown and how it relates to your main question.

    55. Supplemental Figures

      There probably isn't much of a point of including these unless you are going to provide much more context for them. Some may be able to be better used in the text to show more convincingly some of the claims you are making though.

    1. Investigate predictability of chronic condition prevalence between public and private insurances.

      Does this map to your research question? It seems a bit separate from preventative healthcare measures?

    2. By examining these questions, we aimed to contribute to the understanding of how preventative care can improve public health in the United States.

      I suspect you need to greatly expand on this section. You keep mentioning preventative care, but not of your points above obviously link to preventative care?

    3. In the world today, there is a growing problem of increasingly poor health across the globe. While life expectancy has increased, the amount of years spent in poor health has also increased proportionally.¹

      This feels like the start of a introduction section. You can certainly combine the Intro and Background if you want.

    4. endemic

      Is this a term that everyone would be familiar with? Because I'm not sure I could define it off the top of my head. I could hazard a pretty decent guess based on other similar words and the usage here, but...

    5. Within the United States in particular, recent reseach has revealed not only that “the U.S. [has] the lowest life expectancy among high-income countries, but it also has the highest rates of avoidable deaths” despite spending almost 18% of its Gross Domestic Product (GDP) on healthcare.

      This feels like it absolutely needs a citation. You are even directly quoting.

    6. In our reseach, we focused on just Medicaid in order to better serve our low-income and traditionally marginalized communities.

      That is totally ok, but I think you could better reflect that in your title / research question.

    7. we analyzed only data from 2022 due to data quality concerns when comparing year to year.

      Could you not look for the same trends in the 2021 data though as well? To at least see if you see the same things? Or how bad are the data quality concerns?

    8. Since we can expect access to preventative care to be limited for those on public health insurance in the United States

      I'm not sure if I totally follow what you are saying here. By public do you mean social? And if so, shouldn't having the social health insurance give them better preventative care? It wouldn't be limited then?

    9. Variables we investigated from this dataset included percentage of population on Medicare, percentage of adults who smoke, healthcare expenditure, median annual household income, cervical cancer incidence rate, and many more potentially relevant factors.⁷

      This is just what data you used, so I'm not 100% sure what this citation would take me to. You already cited it above, so I don't think you need it here.

    10. Figure 1: ERD.

      All of your types are still text here.

      Plus, you need a much more fleshed out caption on your figures.

      Finally, you need to reference the figure in the text.

      PS: the name of the measurement table doesn't seem to indicate what it is holding, which I think are types of units?

    11. Finally, we added a county column (which we filled with all NA) and edited the state and question fields to replicate the way that data from our CHFF dataset was formatted. This allowed us to merge the two datasets.

      Talking about all of this in the abstract is really difficult for a reader to follow, especially because they still have no idea what the actual columns in your data were. Some visuals would go a long way to making this more clear.

    12. Again, we used text parsing such as grepl and gsub to fill in our newly created demographic, measure, group, secondary_demographic, and secondary_group, columns.

      Well you gave a bit more detail here, but you still need more.

    13. 4.1

      While I think tabsets can be great, I'm not sure if they are a good fit here. But I also don't understand what exactly it is that you are trying to showcase or say here, or why some goals have some much more attached to them here.

    14. 5.1.1.1 Public Insurance

      I'd probably never go this deep with sections. Sectioning this amount is almost more confusing that helpful, because humans can't keep that many layers in their head.

    15. Process outlines:

      You need to actually introduce this section with some text. In fact, you should pretty much always have text coming immediately after a section heading.

    16. 4.2 Statistical Thinking

      While these are the main sections that I am looking for as a grader, I do not think breaking things up into these sections makes the most sense to a reader. You don't need to specifically call these things out. I'll find them, don't worry. Instead, you should focus on telling your story.

    17. ensembles, logistic regression, random forest, naive bayes, PCA, KMeans, Gaussian Mixture Models

      Just off the top of my head, this feels potentially excessive. You should be able to narrow your focus down to just 1 or 2 that make the most sense for what you are trying to model.

    18. What are these models and tools?

      I like the idea behind this reference, but I definitely wouldn't put them in their own subsections. And there is probably too much here, owing to the above comment.

    19. Those who are not using well-known forms of insurance, such as Direct Primary Care, or who are not seeking traditional medical care are not represented.

      This is good, but I think the only direct reference to your own data. I would seek to tie back more of what you are saying to what you specifically did or to what your data specifically utilized.

    20. Medicaid-related covariates, such as how many adults did not see a doctor in the past twelve months due to cost, state Medicaid expenditure, state Medicaid spending, total Medicaid spending, and population percentage on Medicaid.

      Why the pivot from looking at medicaid percentage to all these medicaid-related covariates?

    21. We made another logisitic regression model with just diabetes prevalance and percent private insurance per state with only 23% accuracy.

      Is this comparable to what you did earlier? Because it sounds different here.

    22. While Medicaid-related factors may not be able to completely predict diabetes prevalence, it certainly is correlated.

      You don't actually SHOW anyone these results though. You just report them, so a reader has no idea how you did it, or what bias you might be bringing in your reporting of these values.

    23. There may be a slight linear correlation between mean diabetes percentage per state and Medicaid percentage, so we will see how a regression can predict diabetes prevalence.

      Much better caption here than in early figures. Though I'm not sure why this is Figure 5A.

    24. Private Insurance

      This absolutely should not be its own section, just another paragraph. Especially if you are trying to say these two figures are 2 parts of 1 whole, they can't be split across sections. And really, I'd put them both on the same figure if you are wanting to compare them this way.

    25. Though this is lower than the above-mentioned public insurance data, it is important to note that we are using only one predictor in this model- percent of privately insured individuals.

      So if your goal is to compare, why not compare them on equal playing fields?

    26. Our model accurately predicted CVD prevalence only 15% of the time

      The same model as earlier? 15% being explained by a small dataset when the above was 53% seems like a reach. This feels like a very different response.

      Also, I don't believe you made clear anywhere how much data you were actually working with. That should definitely be included in the data section.

    27. Turning once again to our privately insured individuals, a similar model accurately prediced CVD prevalence only 15% of the time.

      Ok, so that is the same. But you never mention your negative associate figure.

    28. Fig 2A: Median life expectancy of most states is between 77 and 80 years old.

      Reference the figure in the text before the figure appears.

      Also, though it might be obvious, you should indicate units on your life expectancy

    29. Again

      Don't use transition words like this in captions, since there is no guarantee that people will be looking at them in any particular order.

    30. list of variables representing access to care

      How extensive is this list? It would be nice to be able to show it or indicate more than just directing them to a CSV.

    31. We decided to keep state in our linear model because it did have a significant correlation with life expectancy.

      Is it just codependent with other more significant factors though? It is hard to say without having seen a list of the other factors.

    32. There does not appear to be a geographic influence on median life expectancy.

      At all? I find it interesting that Utah is the only seemingly green non-border state.

    33. Figure 2D shows life expectancy based on the percentage of food insecurity.

      It does? I see Time and probability of survival. What am I missing?

    34. Percentage of Food Insecurity and Segregation Index

      I don't believe these have been explained or defined, so I'm not sure what I should think of them.

    35. Percentage of Access to Broadband Internet was also positively correlated with life expectancy.

      This presumably tracks mostly with rural or urban though correct?

    36. lm(formula = `LIFE_EXPECTANCY` ~ `state_name` + `PCP_RATIO` + `%_VACCINATED_(HISPANIC)` + `AVERAGE_DAILY_PM2.5` + `%_FOOD_INSECURE` + `HOUSEHOLD_INCOME_(HISPANIC)_LN` + `SEGREGATION_INDEX` + `%_BROADBAND_ACCESS`, data = ds_life_adjusted)

      I think there is a better way to present this. Also, are these the only variable representing access to care? Or how did you arrive at these?

    37. Figure 2E: Log Rank Test Results for Life Expectancy and Percentage of Food Insecurity

      If you are going to show it, you need to talk about it more.

    38. To explore the curves and log rank test results, please explore our gallery below:

      This is fine for presenting the figures, but you still need to discuss them in the text. Otherwise why show them?

    39. As expected, those who are more food insecure have lower life expectancies.

      I don't understand how the probability and unitless time (maybe in years?) lead to what you are saying here. Probably because I haven't done much of this analysis, but you need to at least explain how someone should interpret the graph for it to be useful to show them.

    40. Many of the same states that have longer life expectancies (Fig. 2B) as have lower infant mortality rates.

      If that is a main purpose of this image, I'd put them next to one another.

    41. significantly correlated

      There are 20 numbers in this table. You should at least mention how you came to these conclusions from the shown data. Don't expect everyone to just instantly see the same thing you did.

    42. We knew that the following variables should be included in the creation of this new variable: Life Expectancy, Age-Adjusted Death Rate, Years of Potential Life Lost Rate, Percent of Frequent Physical Distress Days, and Percent of Frequent Mental Distress Days.

      Why did you know this?

    43. Finally, we repeated our machine learning process to predict age-adjusted death rate based on our significant access to care covariates. Our model accurately predicted the binned death rate for 62% of our test data. Just thirteen access-to-care-related covariates predicted mortality rate 62% of the time! This again emphasizes the importance of access to care in one’s overall health status.

      This is less a comment about this paragraph and more the previous section:

      This felt like a huge data analysis dump, and I was unable to keep most of it straight in my head and how exactly it was tied to your research question. There is a TON going on. At the end I have no clear indication what I should be taking away from it. I think you need to work on possibly trimming some of this away, and really fixating on a clear story, which you constantly relate back to.

    44. We chose the first three for the same reasons explained in the introduction to our results.³

      Yeah I'm going to be honest, that was ages ago and I have 0 recollection of those reasons at this point.

    45. Adjustments for outliers are necessary to better understand the nuanced relationship between income and death rates within our linear models.

      Should I have noticed this in the middle of the analysis? Was it mentioned somewhere?

    46. broadband internet access and median household income showed significant negative correlations with infant mortality, indicating that better access to resources and economic stability

      Honest question: while I can easily see these as cofactors or as a proxy for access to resources and economic stability, I'm not sure that just access to resources and economic stability equates to preventative healthcare. Or at least I'm not sure that has been shown here?

    47. We chose these five variables to summarize our overall health metric since they are all linearly correlated with each other.

      Ok, and why is this important?

    48. We then plotted the weights of all of our other variables against this new one to see how much each contributed to overall health.

      "All your other variables" as in the 5 above? Or a bunch of others?

    49. so we continued by projecting the variables above to this first eigenvalue, thus “creating” a new variable that represents overall health.

      Projecting each of the above variables onto the new basis would have given 5 projections right? So how is this a single variable? Or are you saying that this basis vector is your new variable, and you are working out the relative amounts each of the above contribute to it?

      And if so, it would have been nice to report the relative weights of these variables towards this eigenvector.

    50. Furthermore, our findings related to Diabetes (DM) and Cardiovascular Disease (CVD) reveal significant insights into the impact of Medicaid on chronic conditions. Our linear regression models demonstrated that Medicaid-related factors could predict diabetes prevalence with 38% accuracy and cardiovascular disease prevalence with 46% accuracy. These findings highlight the substantial role Medicaid plays in managing these chronic conditions and underscore the importance of access to healthcare for lower-income populations.

      This feels like an afterthought to the other analysis that was done at this point. Which I don't think it needs to feel that way, but the fact that it was discussed first and then mentioned again last here means it is really tough to remember exactly what was shown and how it relates to your main question.

    51. first eigenvalue

      This is PCA correct? Would that be the better term to use here? Eigenvalue is very much a math term, and if someone didn't know what you were doing here, it seems to me that they'd be very confused by what you were saying.

    52. Fig6B shows which features contribute the most in both a positive and negative direction to health.

      Good, so now comment on what they are and whether they make sense.

      County_name is on here. How does that categorical variable project onto a numeric vector?

    53. Supplemental Figures

      There probably isn't much of a point of including these unless you are going to provide much more context for them. Some may be able to be better used in the text to show more convincingly some of the claims you are making though.