AP Statistics Sampling Distributions and Central Limit Theorem

hello ap statistics students and welcome to the College Board virtual instruction for AP statistics my name is Luke Wilcox I am an AP statistics teacher at East Kentwood High School in Kentwood Michigan and it is Thursday April 23rd I'd like to start today by saying a big thank you to those teachers and students who wished me a happy birthday yesterday was my 41st birthday and we celebrated in style with a star wars-themed birthday my children made me this excellent Luke Scott cake and then the watch returned to the Jedi and I'm feeling very grateful at age 41 to be where I am today and to be with you all today so thank you for that we do have a lot lined up for today we're talking about sampling distributions so I'd like to go ahead and just jump right into the lesson for today so checking to make sure we're in the right place this is AP statistics you all know who I am at this point I hope a few quick reminders for those that may be here for the first time this is a daily meeting that we have at two o'clock Eastern Standard Time we are reviewing for the AP exam and we are in this week right here April 20th through the 24th where we're covering units four and five from the College Board CED which will complete our review of all of the content contained within the CED which means that next week mr. Murphy and I have some special things planned for you some general review things that I think you're gonna be super helpful and then that final week we're actually going to give you some practice questions that might look very similar to the questions that you're gonna see on your modified AP exam this year and of course the big day the big game May 22nd is when you all will be taking your AP statistics exam for 2020 so a quick note here that all of the lessons that you see on YouTube live are available to download at stats Medicom slash YouTube there you just have to know the date and you will be able to find the resources to be able to print those out and then a final reminder here that if you or another student that you know is in need of a device or internet access in order to be able to take that AP exam at home on May 22nd the College Board wants to make sure that you have that so if you just go to CB org slash tech there's a form that you can fill out and they will make sure that you are taken care of all right so let's take a look at the homework from the previous lesson this is a assignment that you got from mr. Murphy yesterday and it's a good leaving into some of the stuff that we're gonna talk about today so here's what it says a company sells concrete in batches of five cubic yards the probability distribution of X the number of cubic yards sold in a single order for concrete for this company is shown in the table below so we see there are only a certain number of values possible for X which would be 10 15 20 25 and 30 and then we see those respective probabilities for each one of those values for the random variable X okay the rest of the question says expected value of the probability distribution of X is 19 point two five okay so what they're giving us there when they say expected value is they're giving us the mean of that value and then they're also giving us the standard deviation of X there is a fixed cost to deliver the concrete the profit Y in dollars for a particular order can be described by y equals 75 X minus 100 what is the standard deviation of Y okay very important here the relationship that they've given us for Y is that Y is equal to 75 X minus 100 now let's think about how the distribution of Y is going to compare to the distribution of X well really there's two things that we're doing to that x value in order to turn it into a Y value the first step is to take that x value and multiply by 75 so we could take each one of these X values and multiply by 75 and what we want to know is what is that going to do to the standard deviation well if you take a list of numbers and you multiply them all by 75 sure enough that's going to multiply the standard deviation by 75 so we know in order to get the standard deviation of why we're gonna have to take 75 times the standard deviation of X but now we need to consider the next step here which is to subtract 100 well here's the deal if you subtract 100 from all of the values in a list all you're really doing there is shifting the distribution 100 units down that's not going to change the variability so really this part here does not have an impact on the standard deviation of Y and sure enough this is all we need in order to find the standard deviation of Y so I'm just going to take 75 times the standard deviation of X which we know from the problem is five point seven six and we are going to get a standard deviation of y equal to four hundred and thirty-two dollars so looking at my answer choices over here I can see that B is the answer that I'm looking for now the most common incorrect answer I will tell you for sure because I've seen students make this mistake is they will subtract 100 because they think that this minus 100 also subtracts 100 from the standard deviation which is not true but this is definitely the incorrect response there the correct answer is B so hopefully you did okay with that and I think we are now ready to talk about some new learning for today so here is the quick lesson overview where I will start with the learning targets and there are three of them today the first two are very parallel so let's take a look learning target number one describe the sampling distribution of P hat and then p1 hat minus p2 hat so this first learning target is all about proportions and it's about sampling distributions for proportions now look at the second learning target once again sampling distribution but this time X bar and X 1 bar minus X 2 bar so this learning target is all about sampling distributions for means and then sort of a sub learning target underneath that is to understand the central limit theorem which is going to relate back to the sampling distribution for means so let's go ahead and jump right in on this one okay page one of the activity here today the question that we're trying to answer is do you like Reese's Pieces and I will be honest with you my 11 year old daughter Reese yes her name is Reese guess what her favorite candy is Reese's Pieces and one day she claims to me that she really likes the orange ones better than the other two now I'm not convinced that that's true but I started to wonder about the distribution of the color of Reese's Pieces and I looked it up and it turns out that the candy manufacturer claims that 40% of the candies are orange so they're not equally distributed okay so first of all you can see we are in the world of proportions here 40% is our proportion that's being claimed for the whole population of all Reese's Pieces and so let's be careful with our notation here the notation we should be using here is PE for the true proportion for the whole population right as opposed to P hat which would be the sample proportion so we're going to use an applet to simulate taking random samples of 50 Reese's Pieces well this very clearly is our sample size and we're going to use lowercase n to represent that sample size of 50 so here's the definition of the random variable it says let n equal the number of orange candies in a sample number of orange candies so we're looking for how many out of those 50 are orange so question 1 says use the applet to take 100 samples of 50 sketch the results below so I'm gonna take you now to the applet and you can just follow along on your screen here and you can find this on the internet if you just type Reese's Pieces applet this will be the first one that comes up and I've already set our probability of orange to 0.4 and our number of candies to 50 and let's just go ahead and take one sample and see what it looks like and what I really like about this applet is that there is an animation so you can physically see the random sample of 50 Reese's Pieces and we're putting them into two buckets the orange ones and then the not orange ones and then you can see here that the applet will count up the number of orange candies in this case sixteen out of the 50 and then down here on the bottom you can see it's putting a tiny little dot right at sixteen so let me show you one more of these we're gonna take another random sample of 50 Reese's Pieces count up the number of orange Reese's Pieces and for this particular sample we got 19 and down here you can see that there is a dot that appears at 19 now the nice thing about technology is that we can do many many samples very very quickly and we want a hundred total and we've already done two so I'm gonna add 98 more samples here making sure to unclick the animation otherwise we'd be watching this for the rest of the hour here I'm gonna draw those samples samples real quick and here is the distribution and so let's notice a couple of things here first of all it appears that the center of this distribution is looks like it's about 20 and as low as maybe 10 is the lowest value 29 is the highest value and I'm starting to see a little bit of a normal distribution here in fact I can click this and it shows me that there is a little bit of a normal distribution that's appearing there so let's go ahead back to the activity and let's copy this down and here's the thing we don't have to copy it exactly what we saw we just need the general idea of what that looked like so here's what I remember I remember that the tallest stack of dots was at about 20 so I'm just gonna draw some dots at 20 I knew the lowest one was maybe about 10 and the highest one was maybe about 29 and then it appeared to have somewhat normal distribution certainly not perfect but I'm just gonna sketch this in and this just has to be a sketch certainly does not have to be exact to what you see in the applet this is just giving us an idea of what this distribution looks like so the first question the answer here is named the distribution well I'm hoping that this looks very to the lesson that we went over on Tuesday where we talked about binomial distributions sure enough this is in fact a binomial distribution and just as a quick review to make sure that this is a binomial distribution there are four conditions that need to be checked and we remember that with the acronym bins B stands for binary so each trial is a success or failure in this case a success is orange and a failure is not orange I stands for independent well we have a random sample so each trial should be independent and because we're sampling without replacement the 10% condition should be checked and certainly 50 is less than 10% of all Reese's Pieces so I'm gonna say the independent condition is met and the number of trials is fixed well here the number of trials is 50 because we're sampling 50 reese's pieces and then asks we have the same probability of success for each trial and the probability of success here we know is 0.24 zero and so that condition is satisfied so we have a binomial distribution so we're going to be able to use some things from Tuesday to help us answer the rest of the questions for this distribution now what does each dot represent well let me take like this dot right here for example what does that dot represent well it represents a random sample of 50 Reese's Pieces and then we count the number that are orange in this case there were 12 and then this dot represents a different random sample of 50 and counting up the number of borange and this is a different random sample and this is a different random sample and this is a different random sample so they answer the question here what does each dot represent it represents one sample of 50 Reese's Pieces and then what we're going to do with that sample is we're going to count the number of orange and we are calling this random variable N and so what we have in this distribution is many many samples right so there is another name that we could get this distribution and that is what's called a sampling distribution and in this case each dot represents a different n value I'm going to call this a sampling distribution of n now a little bit of a disclaimer here in order for this to be the true sampling division of n we must have all possible samples of size 50 well obviously there's like an infinite number of samples of size 50 so we don't have the true sampling distribution here but you know I think that these dots give us a good enough idea about the sampling distribution that I'm comfortable calling it a sampling distribution even though it's not the true sampling distribution of n now I'd like to be able to describe that sampling distribution and let's start with the shape and we've already mentioned here that we think that the shape of this thing is approximately normal and actually if I remember back to the lesson from Tuesday when we talked about binomial distributions there's actually something that we can check to make sure that the normal approximation is appropriate here that's what's called the large counts condition and remember the large counts condition says that n times P 50 times 0.4 zero which is 20 and then n times 1 minus P which is 30 has to be greater than or equal to 10 and that's really telling us the expected number of successes and the expected number of failures they're both greater than or equal to 10 and sure enough they are here and so I'm comfortable saying that this distribution is approximately normal so that takes care of the shape now let's talk about the center of the distribution by by thinking about what the mean is and just by looking at the picture here I can see that the mean is 20 all right you can very clearly see that but if you recall from Tuesday we did have a formula for calculating the mean of a binomial distribution which is n times P and actually we know n times P over here is equal to 20 so that confirms our answer and what we see in the dotplot there that 20 is the mean of this distribution now let's go ahead and move to the standard deviation of the distribution and once again I'm going to go back to our previous lesson on Tuesday there is a formula for calculating the standard deviation for a binomial distribution which is square root of n times P times 1 minus P so all we have to do here is plug in some numbers square root of n is 50 P is 0.4 0 and 1 minus P is 0.6 0 and if you plug that into your calculator you're going to get a standard deviation of three point six now one little warning here is that when you're plugging in a number for n you want to be careful that you're using the sample size there and often students will have trouble deciding whether it should be 100 or 50 we want the sample size not the number of samples so 50 is the appropriate value for n which is what we used in our calculations so we are ready to move on to number 2 which says let P hat equal the proportion of orange candies notice what's changed here we are no longer looking for the number of orange candies we are looking for the proportion of orange candies in the sample so the question is what needs to be done to each value of n to change it into P hat well if I look at my values for n which represent the number of orange candies and I want to change that into a proportion all I need to do is divide by the sample size 50 so the answer here is that we just need to divide each n value by 50 and that'll change it into a proportion so number 3 says in the applet change from the number of orange to the proportion of orange so I'm going to take us back to that same app load that we were using and we've still got our dot plot here and what I'm going to do is I'm going to change from the number of orange candies into the proportion of orange candies and what I want you to do is I want you to watch this dotplot here okay watch what changes so here we go I'm gonna go ahead and change to proportion and what did you notice let me do it again there's the number of orange candies there's the proportion of orange candies hopefully what you noticed is that the dots did not change at all they were exactly the same as they were before but what changed was the values on the x-axis instead of being the number of orange candies we now have the proportion of orange candies so let's go ahead and copy that into the activity here and once again this is just a sketch and hopefully you noticed in our simulation there that point four zero was right about the center of the distribution so I know I had my tallest number of dots there and I had dots as low as this and maybe as high as this and really what I'm trying it's to copy the dot plot from above but just put it on the new scale here where we have proportions instead of the number of orange candies all right so there we go so before we name the distribution I actually want to think about this question first okay what does each dot represents so let's take say this dot for example well that is a random sample of 50 Reese's Pieces and the proportion of orange calculated from that sample and this is a different random sample of 50 and the proportion calculated from that sample and this is a different sample and this is a different sample and this is a different sample so we're gonna say here each dot represents one sample and then what we're gonna do with that sample is we're gonna calculate the proportion of orange and using it being careful with our notation here because it's a sample proportion that would be P hat so over here each dot represents a different sample and the proportion calculated from that sample so we can call this distribution the sampling distribution of P hat remembering that it's not the true sampling distribution because we don't have all possible samples of size 50 so that is the name of the distribution this is what's called the sampling distribution of P hat now let's see if we can go ahead and describe that distribution and I'm gonna start once again with the shape and of course the shape is exactly the same as the shape of the one above it because it's the same distribution just a different scale so of course I can see that this is approximately normal and my justification would be the large counts condition we are already checked that above so the large counts condition also allows us to say that the sampling distribution of P hat is approximately normal alright now let's move to the mean of the distribution okay the mean of the distribution well let's think about this what did we do to all of the values here in order to turn them into these values here we divided each one of them by 50 so what do you suppose that's going to do to the mean probably going to divide it by 50 so what I'm gonna do is I'm gonna take the mean from above which was 20 and I'm just gonna divide it by 50 and that's gonna give me point 4 0 which sure enough is exactly the value we see at the center of the sampling distribution and so this is actually giving us an important formula here it's the formula for the mean of the sampling distribution of P hat and what we did was we started with the formula for the mean of a binomial distribution which is n times P and then we divided it by 50 well in this case if we're generalizing we're dividing by the sample size the ends cancel out and what we end up with is that the mean of the sampling distribution of P hat it's just equal to P the true proportion okay go ahead and move to the standard deviation and we want to know the standard deviation of the distribution well once again what did we do to all of these values to turn them into these values we divided them by 50 what do you think is going to happen to the standard deviation it's going to be divided by 50 so I'm gonna take my standard deviation from above three point four six and I'm just going to divide it by 50 and that's going to give me an answer of zero point zero seven for my standard deviation now let's see if we can generalize that into a formula here okay the formula for the standard deviation of P hat well we started with the formula for the standard deviation of a binomial distribution which is the square root of n times P times 1 minus P and then we divided it by 50 well to generalize we're going to divide it by n and instead of dividing by it and I'm just going to put a 1 over N out in front of it which is the same as dividing by n now here we're gonna have to use a little trick from our algebra class which is to take this 1 over N and put it inside the square root now when it goes inside the square root it becomes 1 squared over N squared or just 1 over N squared and then we still have n times P times 1 minus P well we can do a little simplifying here one of these ends down here is going to cancel with this N and look we are left with a formula now for the standard deviation of P hat it is the square root of P 1 minus P on the top and then N on the bottom so this is the formula for the standard deviation of the sampling distribution of P hat okay we've got one final question here and here's what it says suppose that we took an SRS of 50 Reese's Pieces and calculated the proportion that are orange and an independent srs of 100 skittles and calculated the proportion that are orange describe the sampling distribution of P 1 hat minus P 2 hat so look now we have two samples so this 50 is the sample size of the Reese's Pieces I'm gonna call that n1 because that's our first sample 100 is the second sample I'm going to call that n2 and then we've got sample proportions for each one of those and we want to be able to describe the sampling distribution of the difference between these two sample proportions well let's go ahead and handle this like we do anytime we describe a distribution let's start with a shape and sure enough it turns out that this shape well you can check to see if it's going to be approximately normal in the same way that we did up above but we have to do it for both sampling distributions so that means that n 1 P 1 and n 1 1 minus P 1 for the sample of Reese's Pieces has to be greater than or equal to 10 but we also have to check it for the second sample into P 2 and n 2 1 minus P 2 have to be greater than or equal to 10 if all four of those are true we can say this sampling distribution is approximately normal okay next we're going to move to the center of this sampling distribution and then in the end here we will also get to the variability now in order to get some formulas for the center and the variability I have to take you back to something that mr. Murphy talked about yesterday when you were talking about combining random variables and here's what we're gonna do we're gonna think of this as being random variables here and what we're doing is we're taking the difference between two random variables X minus I and if you recall from yesterday's lesson if you want to get the mean of the difference between two random variables very straightforward you just take the mean of X minus the mean of Y so that means for us if we want to get the mean of p 1 hat minus p 2 hat we just need to take the mean of each one of those distributions and subtract and we know from above this formula here that the mean is just the true proportion so we have the true proportion for the first population - the true proportion for the second population and there is our formula for the mean of the sampling distribution of P 1/2 minus P 2 hat now for variability once again you got to go back to yesterday's lesson with mr. Murphy and hopefully you remember that when you have the difference between two random variables it actually increases the variability there's more variability and so the question is whether you want to add the standard deviations or add the variances and very important here one does not simply add standard deviations what we do add is we add the variances so we know here that we want to go ahead and add the two very so the formula is to get the variance of X minus y we did the variance of X plus the variance of Y now what that looks like here if I'm trying to find the standard deviation of P 1 hat minus P 2 hat I need to add variances well right here I have the formula for standard deviation in order to turn that into a variance I'd have to square it while squaring it is just going to get rid of the square root so the square root goes away and what we end up with is P 1 1 minus P 1 all over n 1 that's the first variance plus the variance of the second population P 2 1 minus P 2 all over n 2 now that's giving me the variance in order to now turn that back you can do the standard deviation I need to take the square root of that big long formula there and so now we have our formula for standard deviation so a lot of formulas that you see here but no fear these formulas are in fact provided for you on the formula sheet for the AP exam so if you go to the second page of the formula sheet the very top here you are going to see all of the formulas that you need for the sampling distribution for proportions which is what we just talked about so if you have one sample or one popular there's your mean and there's your standard deviation and then if you have two samples there's your mean and there's your standard deviation and those match with the formulas that we came up with in the activity so that completes page number one dealing with proportions you probably know what's coming ahead here on page number two we are now going to enter the world of means so there's a little bit of a setup and we're going to use a different applet for this one so here's what it says follow along with me Reese's Pieces are available in snack size bags assume that the number of total candies in each bag follows a normal distribution with a mean of 16 and the standard deviation of 5 ok let me pause right there the mean of 16 now they're talking about the mean of the population of all bags of these snack size so we're going to use the write notation here which is mu and we know that's equal to 16 and then this is the standard deviation of the population so we're going to use Sigma and we know that's equal to 5 so we will use this applet to simulate taking random samples of snack sized bags all right so here we go let X equal the total number of candies in a randomly selected snack size bag sketch the distribution below okay you're gonna have to see what this looks like in the applet here so let me jump over to the applet and this is the applet that we're gonna be using here and that first distribution that you're seeing right on the top there that's the distribution of all of the bags of snack size Reese's Pieces counting up the number of total candies in each bag and sure enough here you'll see that on average there's going to be 16 candies in each bag with a standard deviation of 5 so they're calling this the parent population we're gonna call it the population distribution and so the first thing we need to do in the activity is we sketch this distribution here and once again this is going to just be a sketch does not have to be perfect but we clearly know that it looks like a normal distribution so we know 16 is the mean so I'm going to draw the biggest bar at 16 and then the bars get lower and lower as we move to the right and they get lower and lower as we move to the left so it looks something like this and don't worry about that being perfect just get the general idea down there and first we want to name the distribution and I told you already we are going to call this the population distribution because it represents the population of all snack sized bags of Reese's Pieces now what is each black box represent well you can't really see it in the applet but there's really a bunch of black boxes stacked on top of each other here and the idea is that say this black box right here represents one bag one snack sized bag of Reese's Pieces with maybe 14 total candies in it and this one has 20 candies in it and this one has 25 candies in it so we're gonna say that each black box represents one snack sized bag and what we're gonna do is we're going to count the total number of candies and we're calling that random variable X okay let's talk about or see if we can describe this distribution now the first thing that we want to talk about is the shape of the distribution and we actually are know that the shape of this distribution is approximately normal and the reason I know that if you look above they actually tell you this right in the problem so I'm gonna say that this is given in the question okay we are given that the population distribution is approximately normal now what is the mean well we can see here by the picture that the mean is 16 but we also knew that the mean is 16 because that was given in the problem so let's identify the mean here as being 16 and what's the standard deviation well that's also given to us in the problem we know the standard deviation of the population is 5 okay so now let's move down to the next part where it says let X bar equal the mean number of candies from a random sample of five snack sized bags so we are going to take a sample of five snack sized bags count the number of candies in each one and find the average from those five bags so it says use the applet to take 10,000 samples of size five well let me jump back to the applet here and let's see if we can take a bunch of samples here so we want to oh we want to start out with a normal distribution here and I'm gonna go ahead and click animate it here and notice what happens we've got a sample of one two three four five bags of Reese's Pieces and then this blue box represents the average number of candies for those five bags let me show you another one here is a different random sample of five in the mean calculated from that sample and this is a different random sample of five in the mean calculated from that sample and what I can do is I can do this over and over and over again ten thousand times each time taking a sample of five and then dropping down the mean from that sample of five and so this is the new distribution that we want to look at and the first thing that you'll notice is it does look like it's approximately normal just like the population distribution but very clearly it has a lower variability there so let's take some of those ideas back to the activity now and the first thing we want to do is to just sketch a picture now it's still appeared to have a mean around 16 and so my tallest bar is going to be there but it had a lower variability so I'm gonna make this distribution much skinnier than the one above it I definitely had a lower variability than the population distribution so let's go ahead and answer some questions about this the first one I'm going to actually start here what does each blue box represent well let's say we look at this one blue box right here that box represents a random sample of five bags in the mean number of candies calculated from that sample and this is a different random sample of five and the mean calculated from that sample and this is a different sample and this is a different sample and this is a different sample so what does each dot represent represents one sample and then from that sample we're going to calculate the mean number of candies and the correct notation here for the mean number of candies from the sample is x-bar so what we have over here are many many samples of size five and a mean calculated from each one of those samples and so sure enough we can call this a sampling distribution okay once again it's not the true sampling distribution but I'm comfortable labeling it as such so to name the distribution we're gonna call this the sampling distribution of x-bar now let's see if we can describe to that distribution and I'm gonna start with the shape and we noticed right away the shape was also approximately normal and if I had to justify that I would say because the population distribution is approximately normal which we were told in the problem now what is the mean of the distribution well you can tell just by looking at it that the mean is 16 just like the mean of the population distribution and so we actually have an important formula here which is the formula for the mean of the sampling distribution of x-bar which is just equal to the mean of the population mu now when we move down to standard deviation here we know the standard deviation of the population distribution is 5 and very clearly the variability has decreased right so we want a number less than 5 so it might make sense to divide by something and it seems reasonable that the sample size has something to do with it the bigger the sample size the or the variability so you might think that we would want to divide that by the sample size well it turns out that it's actually divided by the square root of the sample size and that has to do that whole business about adding variances but that's going to give us the correct value for the standard deviation here which is two point two four and so the formula if we generalize this for the standard deviation of X bar is to just take the standard deviation of the population and divide it by the square root of n all right now let's move on to number seven it says change the parent population to be skewed instead of a normal distribution all right so I'm gonna go back to the applet now and I'm gonna change the parent population instead of being a normal distribution let's say that it were skewed and you can see here skewed to the right now first let's just acknowledge that this would be unfortunate because now the average number of candies are getting in a bag of Reese's Pieces is only eight okay a little bit unfortunate so let's see what happens when we take samples of size five so once again sample of five find the mean drop it down right I'm gonna do that ten thousand times real quick and what do you notice well definitely less variability and but I still think I see a little bit of a skew to the right on this one okay now let's see what happens when we increase the sample size so down here I'm gonna change my sample size to 25 and it's going to do two different distributions for me the first one is samples of size five the second one is samples of size 25 and what do you notice down here well once again less variable because our sample size went up but also it's starting to look like it's approximately normal so the amazing thing here is that even though the parent population is skewed to the right the sampling distribution is approximately normal if your sample size is big enough so let's go ahead and capture those thoughts in the activity here all right number seven samples of size five describe the shape of the sampling distribution well for the first one the samples of size 5 it was still a little bit skewed right so I'm gonna say slightly skewed right but the amazing thing was when we jump down to samples of size 25 the sampling distribution appeared to be approximately normal so number 9 says what happens to the shape of the sampling distribution as the sample size increases well here's the deal as the sample size goes up the shape of the sampling distribution becomes closer to a normal distribution and this very very impressive fact is one of the biggest ideas in introductory statistics and is what is called the central limit theorem the central limit theorem says as your sample size increases the sampling distribution gets closer and closer to normal no matter what the population distribution looks like skewed left skewed right bimodal doesn't matter now how big of a sample size do we need in order to be comfortable saying the sampling distribution is approximately normal kind of depends on who you ask or what book you look in one of the generally accepted values is that if n is greater than or equal to 30 you're usually pretty safe there alright last question here number 10 suppose that we took an SRS of 25 Reese's Pieces snack sized bags and calculated the mean number of candies X 1 bar and an independent SRS of 40 skittles snack sized bags and calculated the mean number of candies okay you can see the parallel to what we talked about on the previous page we now have two samples so let's go ahead and talk about the shape the center and the spread for this distribution and I'm going to write it as variability here okay first of all for the shape we're gonna hope that somehow we can arrive at the fact that it is approximately normal and there are a couple of different ways to arrive at that which I will summarize when we get to the important ideas now for the center of the sampling distribution here we want the mean of X 1 bar minus X 2 bar and really we just need to take the mean of the first distribution minus the mean of the second distribution and we know up here that the mean is just mu so it's just going to be the mean of the first population minus the mean of the second population now for the standard deviation of X 1 bar minus X 2 bar you might think that you need to subtract standard deviations but we know that the variability increases so we need to add but remember one does not simply add standard deviations we have to add the variances so I need to take this and turn it into a variance by squaring it so that's gonna give me Sigma 1 squared over n because this square root is gonna go away and then I need to add the variance of the second distribution it should be 1 this should be 2 now that gives me the variance in order to turn it back into standard deviation I just need to take the square root of that whole thing now once again a lot of formulas there but the nice thing is is that on your formula sheet here the next thing you see is sampling distributions for means if you have one sample there are your formulas if you have two samples there are your formulas there so let's see if we can go ahead and summarize our learning from today's lesson so here's how I'm going to organize the important ideas of course this is all about sampling distributions but there are two different worlds in the sampling distributions there is the world of proportions and there is the world of means and what I'd like to do here is to compare how those two sampling distributions look in terms of three very important things the shape the center and the variability so let's start with the shape of the sampling distribution for a proportion and we decided that hopefully we can arrive at the fact that it is approximately normal in the way that we check that is by using the large counts condition just like we did for the binomial distribution which says that n times P is greater than or equal to 10 and n times 1 minus P is greater than or equal to 10 now for the sampling distribution of the mean once again we would like to arrive at the fact that that sampling distribution is approximately normal we have a couple of different ways that we can arrive at this the easiest way and the way we used in the activity is that we were given in the problem that the population distribution is normal so of course if the population distribution is normal then the sampling distribution is approximately normal the second way which we arrived at at the end of the activity is using this idea of the central limit theorem that if your sample size is large enough so I'm going to say here sample size is large and for most textbooks that's n greater than or equal to 30 then by this central limit theorem CLT we know the sampling distribution is approximately normal now there is a third special case where you don't know anything about the population distribution and your sample size isn't big enough we're less than 30 in that case you need to make a graph of your sample data a dot plot a histogram a box plot and what you'd like to see is that the sample data show no strong skew or outliers which would mean that it's reasonable to assume that that sample came from a population distribution which is approximately normal which then allows us to say the sampling distribution of x-bar is approximately normal okay now let's move to the center this one will go pretty quickly the center of the sampling distribution for P hat it's just the true parameter the true proportion of the population over here four means the mean of the sampling distribution of x-bar is just equal to the true mean of the population meal and then finally let's write our formula for standard deviation as a measure of variability standard deviation of P hat which we arrived at in the activity is the square root of P 1 minus P all over n which we derived from the binomial distribution formula for standard deviation and then over here for means the mean of X bar is equal to Sigma over the square root of n now I've just given you the formulas for one sample we did also derive the formulas for two sample I'm not going to put those into the important ideas you can always look those up in the formula sheet for the AP exam now we have a homework question for you here it is a multiple-choice if you could try that before tomorrow's lesson I will be with you once again tomorrow tomorrow is going to be frq Friday which means I'm going to bring three of my favourite Fr Q's and we're going to go through how to look at a model solution but also the rubrics that are used to grade those so I certainly hope that you'll be with me again tomorrow and have a good evening

Loading