Hello!Anyone know a good difference between Data mining and Sample selection bias? They seem very similar. Your feedback would be much appreciated!!Source: Kaplan Patrick

Hey @pcunniff - so sample selection EXCLUDES subset of the data in the population, so it's not truly random or representative of the populationWhereas data mining is just blindly searching for highly correlated patterns/relationships in the dataset ("fitting"), without a proper economic significance to it in the first place.

@cfachris so data mining is NOT sampling. Is that the key diff?? I hate when definitions can be almost the same.

A CFA Level 1 Discussion About Data Mining V Sample Selection Bias

Add A Reply

This topic has 3 replies, 2 voices, and was last updated Jan-2110:50 pm by cfachris.

Author

Posts
- pcunniff
  Participant
  - CFA Level 1
  13 Jan 2021 at 3:08 am
  Up
  8
  ::
  Hello!
  
  Anyone know a good difference between Data mining and Sample selection bias? They seem very similar. Your feedback would be much appreciated!!
  
  Source: Kaplan
  
  Patrick
- cfachris
  Participant
  - CFA Level 3
  13 Jan 2021 at 9:53 pm
  Up
  3
  ::
  Hey @pcunniff – so sample selection EXCLUDES subset of the data in the population, so it’s not truly random or representative of the population
  
  Whereas data mining is just blindly searching for highly correlated patterns/relationships in the dataset (“fitting”), without a proper economic significance to it in the first place.
- pcunniff
  Participant
  - CFA Level 1
  19 Jan 2021 at 10:41 am
  Up
  3
  ::
  @cfachris so data mining is NOT sampling. Is that the key diff?? I hate when definitions can be almost the same.
- cfachris
  Participant
  - CFA Level 3
  20 Jan 2021 at 10:50 pm
  Up
  3
  ::
  ugh I know what you mean, it can be confusing.
  
  Yes, data mining is NOT sampling. Sample selection means exactly that, you CHOOSE the right sample data set (by excluding a subset of data in the population), so it shows biased results and cannot be relied on.
  
  Data mining on the other hand, includes a relevant dataset, but instead trying to find a random mix of variables that correlate significantly with that dataset, without an economic rationale behind it in the first place.
  
  Like to give an extreme example, if the weather in a local town somehow correlates highly with a person’s income (economically doesn’t make sense as a theory, but say it is somehow statistically significant in a dataset for a random reason and included). Unless you’re a fisherman perhaps that may make sense…
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

CFA Calendar

21 Jul 2025

L1 Aug25 Online Rescheduling Deadline ( Online)

21 Jul 2025

📅 21 Jul 2025

Deadline to reschedule your CFA exam date, if you need to change this ($250 rescheduling fee applicable). Click on the link to sync all key CFA deadlines to your calendar.

🎓 CFA Institute
🔗 Learn more about this event →
🔗 More CFA Institute events →

See more details

28 Jul 2025

L2 Aug25 Online Rescheduling Deadline ( Online)

28 Jul 2025

📅 28 Jul 2025

Deadline to reschedule your CFA exam date, if you need to change this ($250 rescheduling fee applicable). Click on the link to sync all key CFA deadlines to your calendar.

🎓 CFA Institute
🔗 Learn more about this event →
🔗 More CFA Institute events →

See more details