CFA CFA Level 1 Data Mining V Sample Selection Bias

Data Mining V Sample Selection Bias

  • This topic has 3 replies, 2 voices, and was last updated Jan-21 by cfachris.
  • Author
    Posts
  • pcunniff
    Participant
    Up
    7
    Down

    Hello!

    Anyone know a good difference between Data mining and Sample selection bias? They seem very similar. Your feedback would be much appreciated!!

    Source: Kaplan

    Patrick

    cfachris
    Participant
    Up
    3
    Down

    Hey @pcunniff – so sample selection EXCLUDES subset of the data in the population, so it’s not truly random or representative of the population

    Whereas data mining is just blindly searching for highly correlated patterns/relationships in the dataset (“fitting”), without a proper economic significance to it in the first place.

    pcunniff
    Participant
    Up
    3
    Down

    @cfachris so data mining is NOT sampling. Is that the key diff?? I hate when definitions can be almost the same.

    cfachris
    Participant
    Up
    3
    Down

    ugh I know what you mean, it can be confusing.

    Yes, data mining is NOT sampling. Sample selection means exactly that, you CHOOSE the right sample data set (by excluding a subset of data in the population), so it shows biased results and cannot be relied on.

    Data mining on the other hand, includes a relevant dataset, but instead trying to find a random mix of variables that correlate significantly with that dataset, without an economic rationale behind it in the first place.

    Like to give an extreme example, if the weather in a local town somehow correlates highly with a person’s income (economically doesn’t make sense as a theory, but say it is somehow statistically significant in a dataset for a random reason and included). Unless you’re a fisherman perhaps that may make sense…

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.