Zoom Logo

CIS 399 Lecture, March 24 - Shared screen with speaker view
Matthew Jortberg
37:53
Notes and slides
Wesley Gill
38:26
its fixed
Wesley Gill
38:37
now we only see clock, it was fixed for a second
Matthew Jortberg
38:43
you can choose a specific window in zoom screen share
Nadia
38:48
I think if you just check “current slide” it will work
Matthew Jortberg
38:48
Instead of entire screen
Wesley Gill
01:20:31
is it reasonable to assume that the sample groups are independent? It seems likely that if there were a person A whose family was open to reporting A's death, and a person B whose family was afraid to report their death, then person A would be more likely to appear in both sample groups, and person B would be more likely to occur in neither. This would indicate, that these two groups aren't really independent (in particular this example would cause an underestimate of the number of people). For the fish example, the analogy would be, what if some fish are more prone to be caught by fishermen (and others are very good at avoiding capture)? This would also result in an underestimate in the true number of fish.
Michael Kearns
01:21:48
Yes Wes I think that’s a very valid point… independence in the fishing case seems more reasonable that reporting killings.
Michael Kearns
01:22:37
https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
Michael Kearns
01:23:40
But note that even a known underestimate could be useful
Wesley Gill
01:24:04
good point
Richard J Li
01:24:58
yes
Matthew Jortberg
01:25:12
yes
Emily
01:25:44
Do you have duplicates of the letters?
ZV
01:25:50
My question as well
Nolan Hendrickson
01:26:11
^
Michael Kearns
01:26:38
Not sure what you mean, but I think we’re assuming sampling with replacement here, i.e. you catch a fish and tag it and then return it to the lake, so you can catch it again
ZV
01:27:19
My concern would be that recording the label of the letter would not be equivalent to "tagging" if there are duplicates.
DC
01:27:27
How does the size of the sample affect the accuracy
Richard J Li
01:27:27
makes sense thank you
Michael Kearns
01:29:44
DC in the fishing version, if the unknown population size is n and you’ve tagged k fish, then the recapture probability is k/n. when k is small then estimating k/n is very noisy and so less accurate; when k/n is larger, e.g. 0.3 then you have a very reliable estimate. So the expectation is always k/n but the estimate gets more accurate with larger k. Make sense?
DC
01:31:24
Yes, thank you!
Wesley Gill
01:40:23
if you are estimating n, how can you create a sample that is 30% of n ahead of time (and how do you know if your sample is noisy or good)?
Michael Kearns
01:46:06
http://rob.schapire.net/papers/good-turing.pdf