R code to extract data from unique datasets and mix them in a single harmonized dataset ready for seamless evaluation
My academic research overwhelmingly includes identifying datasets for health research, harmonizing them, and mixing (pooling) the person datasets to investigate them together. This implies combining datasets across populations, study sites, or countries. It also means combining variables in order that they will be effectively analyzed together. In other words, I work in the information pooling field where I actually have been full time since 2017.
I’ll outline the methodology I follow to extract data from individual datasets, and to mix the person datasets into one pooled dataset ready for evaluation. This relies on over seven years of experience working in academic environments globally. This story includes code in R.
Data pooling — what’s it?
In most settings we are going to collect recent data (primary data collection) or work with just one dataset that’s already available for evaluation. This one dataset will be from one hospital, a selected population (e.g., epidemiological study conducted in a community), or a health survey conducted throughout a rustic (i.e., nationally representative health survey…