could another data reality (limitation) be the external validity..., which I suppose is the sampling biases and missing data. But I wonder with big data, we truly don't have a sampling frame, or do we? Is there a way to apply the big data to a super population to have an adjustment such that the big data is mathematically adjusted to reflect the unknown sampling frame, and thus have (within range of extreme values) an estimate that reflects some external validity.
thank you! One other challenge: validity constructs are known to be highly biased by culture. So, the data used may just reemphasize disparities if the data is used to translate service delivery. Lots of "validated" surveys have items for factors that have low face validity for populations representing American Indians, disparate groups of latino background.
A relevant paper on interpretability of ML: https://arxiv.org/pdf/1806.00069.pdf