Hey @pcunniff – so sample selection EXCLUDES subset of the data in the population, so it’s not truly random or representative of the population
Whereas data mining is just blindly searching for highly correlated patterns/relationships in the dataset (“fitting”), without a proper economic significance to it in the first place.
ugh I know what you mean, it can be confusing.
Yes, data mining is NOT sampling. Sample selection means exactly that, you CHOOSE the right sample data set (by excluding a subset of data in the population), so it shows biased results and cannot be relied on.
Data mining on the other hand, includes a relevant dataset, but instead trying to find a random mix of variables that correlate significantly with that dataset, without an economic rationale behind it in the first place.
Like to give an extreme example, if the weather in a local town somehow correlates highly with a person’s income (economically doesn’t make sense as a theory, but say it is somehow statistically significant in a dataset for a random reason and included). Unless you’re a fisherman perhaps that may make sense…
- You must be logged in to reply to this topic.