top of page
  • Peter

Healthcare Data Book Recommendations

Work and life have been pretty busy recently, and I haven't been as diligent in reviewing healthcare data science books as I intended. Instead, I wanted to put together a list of the books I find really useful and recommend when I teach. These are also the books I'm referring people to in the 'Introduction to health care data' book draft I'm slowly putting together as the references when you're ready to go deeper on a specific topic. Also, my focus here is on health care data topics and not clinical medicine or software platforms. Several of these books are imperfect, but either the only book I've ever found on a topic, or the best introduction to a topic. I still intend to write more reviews/impressions of the many, many other 'healthcare' data science books you can now find on Amazon.


Of note: these books are really focused on American healthcare -- I'd imagine portions of these books are applicable to non-American healthcare systems, but I can't say for sure.


Also of note: Yes, several of these are SAS-based books. That's because they are the only books I've ever found on their particular topic. If there was an R or Python book on this topic, I'd look at recommending it.



Understanding American Healthcare

My first pick is foundational to understanding health care - I really remain committed to the idea that to understand healthcare data, it's necessary to have a good understanding of healthcare.


There may be more concise books, but 'The Social Transformation of American Medicine' is truly biblical. I haven't read it in-depth since before The Affordable Care Act was passed, and I'm not sure if new editions have been released to cover this most recent era of policy changes and developments, but, at least for the time period it covers, this book is truly comprehensive on the history and development of the modern American healthcare system.


In a data context, I like to tell people this books answers many of the 'why?' questions that come up. Why do we have two totally different reimbursement systems within the healthcare system? Why are insurance benefits largely an employment benefit? Why did America choose to create a unified, socialized form of insurance for some populations? This book gives you the historical context of 'why' that I think is crucial to understanding the bigger picture when working on a healthcare data science project.





Intersection of healthcare process & data

A huge factor in healthcare data science is understanding the processes of healthcare AND how those processes become a myriad of data. To me, this is likely the most fundamental knowledge to gain to work in healthcare data science, but, so far, I've never found a book that really attempts to do this with any breadth. Ultimately, this is what the book proposal I'm slowly working on hopes to do.


'Hacking Healthcare' is a book I recently found when taking a fresh survey of books focused on health data. It seems to have originally been intended for a health IT audience with some focus on Meaningful Use implementation, but from the bit I've read it appears to give some explanation of healthcare process from a data perspective beyond 'these are ICD codes - they are used for diagnoses', as many books ostensibly on healthcare seem to do. Once my copy of this book arrives and I can review if fully, I'll update my assessment. Similarly if I find a book that does this better, I'll update this section.





Healthcare Reimbursement

I'll be honest, I don't love this book but it's really about the only book I've found that actually goes over the mechanics of American healthcare reimbursement. You can find dozens of books on healthcare finance or financial management, which are essentially useless when you're trying to aggregate a table of revenue cycle transactions into a derived claim header structure. There are really surprisingly few books on this topic given the importance and complexity of the topic.


My problem with 'Principles of Healthcare Reimbursement' is that it doesn't really teach concepts, so much as explain in effusive detail a variety of different reimbursement scenarios. Do you want to know the bigger picture of why American healthcare chooses to incentivize certain behaviors and minimize others through reimbursement? This is not the book that will help you. However, if you need to know the 14 steps to generating a per diem bill for an inpatient psychiatric stay...this will get you there.


To be fair, this book isn't really intended for data scientists or researchers - it's an AHIMA book intended for people in HIM departments and roles. So, with that clerical and administrative perspective in mind, the content makes more sense. Still, while it's not perfect, it's the best book I've been able to find on this topic.






Claims Data

A few years ago, I was in a curriculum planning session for deciding what classes should be taught or developed on healthcare databases at our university. One senior faculty member dismissed the idea of formally teaching claims data - they firmly felt learning claims data should just be an apprenticeship or part of a dissertation like it's always been. This is just crazy to me - claims analysis basically formed the basis for large portions of healthcare research and operations for lots of entities. I can also say from experience it's surprisingly difficult to find someone wiling to work outside of an insurance company or public health research that actually understands claims data deeply in any facet. This need to try to find content to train new hires on claims data led me to search exhaustively for any resources on, again, what seems like it should be topic with lots of books. Spoilers - there are not.


Craig Dickstein, with a few coauthors, wrote an initial book on healthcare data from a SAS perspective many years ago, and 'Administrative Healthcare Data' is a comprehensive update to that book. The examples and code are SAS - based, but the majority of the content is universally applicable. Interestingly, there also just aren't these types of focused healthcare topics books for R or Python, from what I've seen.


Of note, I'd say this book is a good (also only) book I've found written generally on claims data, but because governmental claims data are somewhat different from commercial claims this book may not help that much if you're working in Medicare or Medicaid claims data.







Medicare Claims Data

Here's the governmental follow-up to my previous pick for claims data. There are definitely more resources for working with governmental claims data, especially Medicare, but it's nice to have a book written purposefully instead of poring through dozens of PDF's and ResDAC pages trying to piece together your understanding. I'm a bit surprised given the number of people doing this work, but I have yet to see a modern book on analyzing governmental claims in R or Python.








EHR Data & Analysis

This was a brand new book just released when I started teaching my EMR data class in 2017, and, again, this is the only book I've found on the topic of analyzing EHR data. You can find tons of books that have EHR or some derivation in their title, but I have yet to find another book that actually talks about analyzing that data.


That being said, I do think this book has some limitations. To me, it's starting from a pretty advanced place - it's not an introduction to the topics of using EHR or real world data. I also didn't feel like it did as much in the way of explaining the functionality and resulting data structures in EHR's as I really wanted. I really wanted a book that focused on that intersection of healthcare process and data - much of this book was the academic paradigm of a bit of didactic and then a vignette of an experimental design from an analysis that's been published. I was happy to use this book for my course with graduate students, but I'm not sure this book is going to get new data scientists up to speed on working with EHR data.








Introduction to warehousing & data structures

There are a million books on data warehousing and relational databasing, but someone recommended this book to me early in my career and I've rarely needed to look for any other books on the topics. This book, again despite being older and SAS focused, does a really fantastic job of introducing and focusing on data warehousing and analytics-relevant architecture concepts without trying to get you up to a DBA level of knowledge.


If you already are a data engineer or intend on being a big query / Spark / SCALA pipeline engineer, granted, this book isn't going to cut it for you. But, especially early in my career when I was just learning architecture, I often found that I needed a bit of engineering or custom architecture for a data science project I was working on and this book was approachable for that without having to first go determine my ACID compliance or plan out my normal form.







Healthcare Data Anonymization

I've actually only recently found this book, and, like many of the books on this list, this is the only book I've actually found on this topic. I'm still reading it, but I can already recommend it to at least round out a healthcare data science library. I'm at the point of designing large-scale datasets to be shared for innovation and will need to do extensive anonymizing, but even if you're not anonymizing data this book is good for helping you to think deeply and critically about HIPAA compliance and related factors. If I had a complaint so far, it would be that some portions of the book are devoted to what I'd call data governance or data project review and planning, which I'm not particularly interested in compared to the technical aspects, but I understand why it's included for completeness-sake on this topic.


1,699 views0 comments

Recent Posts

See All

Hospital price lists will be a joke

Several people have been pointing me to articles such as the one below from the AJC – all reminders of the 2018 CMS requirement of hospitals participating in Medicare to provide some form of pricing t

bottom of page