CIS 399 Lecture, March 24 - Shared screen with speaker view
Notes and slides
now we only see clock, it was fixed for a second
you can choose a specific window in zoom screen share
I think if you just check “current slide” it will work
Instead of entire screen
is it reasonable to assume that the sample groups are independent? It seems likely that if there were a person A whose family was open to reporting A's death, and a person B whose family was afraid to report their death, then person A would be more likely to appear in both sample groups, and person B would be more likely to occur in neither. This would indicate, that these two groups aren't really independent (in particular this example would cause an underestimate of the number of people). For the fish example, the analogy would be, what if some fish are more prone to be caught by fishermen (and others are very good at avoiding capture)? This would also result in an underestimate in the true number of fish.
Yes Wes I think that’s a very valid point… independence in the fishing case seems more reasonable that reporting killings.
But note that even a known underestimate could be useful
Richard J Li
Do you have duplicates of the letters?
My question as well
Not sure what you mean, but I think we’re assuming sampling with replacement here, i.e. you catch a fish and tag it and then return it to the lake, so you can catch it again
My concern would be that recording the label of the letter would not be equivalent to "tagging" if there are duplicates.
How does the size of the sample affect the accuracy
Richard J Li
makes sense thank you
DC in the fishing version, if the unknown population size is n and you’ve tagged k fish, then the recapture probability is k/n. when k is small then estimating k/n is very noisy and so less accurate; when k/n is larger, e.g. 0.3 then you have a very reliable estimate. So the expectation is always k/n but the estimate gets more accurate with larger k. Make sense?
Yes, thank you!
if you are estimating n, how can you create a sample that is 30% of n ahead of time (and how do you know if your sample is noisy or good)?