Iterations using combination of filtering conditions

So last time we looked at this question of filtering, so we said we would go through the cards and based on some characteristics of the cards, some property of the card, we will decide whether it is useful or not. For instance, we looked at pronouns, we looked at verbs we also said we could keep track of both pronouns and verbs at the same time because something is either a pronoun or a verb. So, now let us do something more interesting. So I am going back to this data set, classroom data set. Now let us say in this data set I want to find out all the girls, who not just want do I want to find out how many girls are there, but I want to find out how many girls are there from a specific city, let us say Chennai. Okay. So how many girls are from Chennai, how many Chennai girls are in thisů So there are two, so there is a gender and there is a town/city, so there two items and what we want is this should be female and the town/city should be Chennai. Yeah, how would we do that, I mean is there simple way of doing itů So what we could do first is we could first separate out all the girls, so that would be one step of filtering. Want to try that? Yeah. So So here is a girl, so I guess I put it in a different pile, here is also girl, girl, girl, boy, boy, girl, girl. Girl. Girl, boy. Boy, boy, boy, boy, boy, one more girl, boy, boy, girl, boy, boy, boy, boy, girl, boy, boy, girl, girl, girl. So now we have filtered this data into two piles, everything on this pile is a boy Not useful Everything there is the girl : This is one which we are interested in Now, among the girls, we want those who are from Chennai. So we move this aside keep it separately and now we want to go through this again look at the town/city thing. Yeah soů So this is not Chennai, Bengaluru, not Chennai, this is Chennai, this is Chennai, this is Chennai, this is Chennai, not Chennai, this Chennai, not Chennai, not Chennai, not Chennai, not Chennai and not Chennai. So now we have pulled out those cards which are a combination of two different interesting things for us, that they are both girls and they are from Chennai. And we can count this I guess. We shall 2, 3, 4, 5 So among this entire remember we add 30 students we had counted at the beginning, so among those we have found that there are 5 girls from Chennai. So in this what we have done is we have done filtering in two stages, so first we have applied a gender filter to pull out all the girls and then we have applied the city filter to pull out all the Chennai, I guess we could have done it the other way, we could have first pulled out all the Chennai people and look for the girls in that. Should give the same answer, right? Because the same cards would come out. Yes First, we should have filtered the Chennai. Yeah, but supposing we did it in a one step, can we do it in a one single, so what I said is instead of going through and first explicitly pulling out all the girls, and then going through all the girls and then pulling out all the Chennai people, do you think we can just do it in one shot like we have done for some of the other iterations overlap this thing that we could check? We should I guess that would be I mean just go through the cards in one iteration and at each stage you look at the card and see whether it is a female, that is not enough. And we also need to check So we have to do, so you have to check two conditions. So you have to check So both of them have to be true Both of them have to be true, so it is an Ĺandĺ, and of two conditions, and of two conditions, the first condition is the gender F and the second condition is this the town/city Chennai. So if it is an and of, if both of them are true, then only we select the card for counting we keep count I guess, so we have a count variable and so maybe we start with a count variable to 0. Yeah. So we could do that, maybe I could do that by count And then we go through this So I will say, Chennai Girl, Chennai girls equal to 0. So here is a card which is a male, so it does not satisfy the condition. So if it is a male we do not even have to look at the city. Do not have to look at the city, so that is interesting. So, again here it is a male, so do not have to look, male, male, so female now we look at the city, Trichy So it is not Chennai. So again, female Teni, female Bangalore, female Madurai, female Chennai. So now I have to increment my count. Increment the count, another female Chennai. Increment again. One more female Chennai, they all come together I guess because you put the cards together, another female Chennai and then one more female Chennai, and there is a female Madurai, Erode, female Nagercoil, Bangalore, male, male, male, male so the rest I think are not cards that are useful to us, so just going through all of them. So now basically in one So we got the count, in one iteration we kept track of a variable called Chennai girls and we kept incrementing that variable provided that both the conditions were satisfied, it is Ĺandĺ of two conditions, gender being female and the city being Chennai. So this is a very useful thing now, so we know that we can take the entire stack of data or sequence of data or cards whatever we want and we can pull out those which are interesting based on some property, in this case it was the city, for example if it was the shopping bills it could be based on whether the total is bigger than something and they have bought at certain type of item, in the case of the words for instance it could have been if it is a noun and it is more than five letters long, so we can take various combinations of conditions and filter on them simultaneously. Is there something bit more interesting we can do, for example, can we find all those people who are born between one date and other date? Yeah, let us try that, so let us see if we can find all people who are born in the first half of the year, say between 1st January and 30th June. It should be half, we do not know. We do not know, but let us see, is it true that, let us see. It should be half, I mean if it is random it would be half, but this may not be randomů So let me keep track of a count for this, so this isů So here we are actually now not checking against two of these fields, we are checking one field only, we are checking the date but now we are comparing the date, we have twoů So we could still do two, so we could say for male and for female, so we can say they should be born in the first half and they are male born in the, so how many boys were born in the first half of the year, how many girlsů Girls are born in the first half. So to check whether it is the first half the date has to be, so if it is 22 July is it first half or it is not first half? No. It is not first half. So anything should be up to June 30th. So which means that it should be So we should first check the date. It should be more than 1st January or equal to, more than or equal to 1st January and less than or equal to 30th of June. That is right. So that is what it means. So let us check this this is, so this is 22 July, so does not meet the criteria. 4th March Yes, and it is It meets the criteria because it is greater than its between 1st of January and 30th of June, so it is in that space And it is a male. And it is a male. So I will increment the male count by 1. The Next one is 17th of September, so again in the second half. 30th of August 2nd half. 2nd half because it is greater than After June. After 30th of June. 6th May this one lies between 1st of January and 30th of June. And it is a male. We should count it, it is a male, so it should go to the male one. 13th October again is in the second half. 3rd June is before 30th of June so should be in the first half male. 4th Jan, again male, looks like there are more males. 14 December No. So no, 30th December no, 7th November no, 30th April male. So that is now 5. 26 December male, 13th May, so first time me got a female, 13th May first half. 17th July No, too late. Again no, 9th October No. 10th September. No. 12th Jan. Yes. Female, 16th May again, female. Yes. 8th February female, 14th Jan female, 5th May female, wow, 17th November, so this does not count, 15th March So I am running out of space I am moving to the left, so I will say 7 now. 22nd September. No. 23rd July. No. 23rd March. Yes, so male now goes to 6. 15th March, again male. Male goes 7. 28th February male. Okay, 8. 6th December does not count. So what do we get? So of the 30 students, 8 males were born in the first half, 7 females. So, roughly half the students, actually exactly half 15ů 15 of them were born in the first halfů This is not bad. Which is surprising and off that since we have an odd number of 15 is roughly equal, 8 males and 7 females. So this is a very equally Looks like a balanced, looks like a very balanced data. So this is interesting, so now so basically so you can filter on multiple things at the same time and then when you are filtering on multiple things you can also keep track of them separately. So we have two conditions, when they were born and whether they are male or female and based on the combination we have one count for males born in the first half, one count for females born in first half. And we have used the same pattern, we have used the iterator pattern, initial values of so we kept two variables, male female born first half variable count, that is a count so we have two variables that we are keeping track and going through the iteration each time filtering on the gender as well as the date of birth and checking whether the date of birth field falls between two values, 1st of January and 30th of June, inclusive and if it falls then we add to the male count or the female count, that is what we have done. So it is just an iteration pattern with a filter added. So filtered iteration, very good. So let us do something more interesting with this set, so here we have the maths marks highlighted and earlier we computed the average marks for the whole class but supposing I wanted to find out as a teacher who is better in general in the class, are the girls doing better in maths or the boys are doing better in maths. So what would be a good way to look at it, should we look at the highest marks maybe the which I am not very sure about that, highest marks because there could be one exceptional boy let us say, but you know the girls could general be doing better than the boys, but only one exceptional boy may be there and just by looking at the maximum you will biasing it by looking at this exceptional boy candidate. So I do not think we should look at the maximum. So then, if you want to look at the general trend I guess average is as good as Average is, average should work I think, if you find the average of the boys and the average of the girls and compare it, if the average of the girls is higher, which I think is likely than the boys, then the girls are doing better, this is what I would think is right way of doing that. So, we saw how to compute the average, we have to find out how many cards there are and then we have to add up the marks across those cards and divide by the number of cards but now we have to separate the cards, we have to filter the cards into those for the boys and those for the girls. So we are doing total, we are trying to find the total average total marks for boys. For maths total. Maths total, we are checking which, whether girls are doing better than boys in maths? In maths, yes. So we can find the average maths marks for girls, average maths marks for boys and compare the two, that is what we want to do. So simple way would be to simply separate the boys from the girls, and then find the average for each set like we did last time, last time we did the average for the whole set So we filter it out into two sets boys and girls Boys and girlsů= Then we process the boys separately Boys separately find the average, girls separately find the average and compare the average. So why do we need to separate them, why cannot we just do it as we are going along, so surely like we said we did it earlier, we can keep track of all these things in one single iteration, so we could keep track of the We need to keep track of to keep to find average of the boys, what we need we need the total mathematics marks of only the boys and we need to find the total number of boys and then if you divide the total mathematics marks of the boys by the total number of boys that gives you the average for the boys. Similarly, for the girls you have find the total mathematics marks of the girls and find the total number of girls and then divide the total mathematics marks of girls by the total number of girls that will give the average for the girls. So we have to keep track So we see it looks like we keep track of 4 things. So let me just write it down, so we need say let me just break up this space into four things, so we need say the boy count We need the boy count. And of course we need the girl count. So this will tell us how many of each there are and then we will keep track of the boys sum and the girls sum. And we are going to do all of it in one iteration, as we go through the cards As we go through in one iteration. So here there is some filtering going on because we are looking at whether it is a boy or a girl and then doing the so these few things And then this, and then use accumulation because Accumulation because we are adding the mathematics marks and then we are counting, so we are counting, accumulating, filtering all together in one shot, so let us try it. So here is the first card, this is a boy So boy is now count is 1. And mathematics is 72. So the sum is now 72. Second one is a girl. Girl count is 1. Girl count is 1 and 74 is the maths marks. Third one is again a boy, count Count is 2. Maths is 81. So 72 plus 81 is 153. The next card is a male again boy 3. 74 is the maths marks. So that is 227. The next is girl and 62 is the maths marks. So it is 136. So here is a fantastic maths mark by a girl who is getting 97. So that is 233. So here is a boy who is not doing that well, maths mark 44. So that is 271. We can do like this we can go through the whole set I guess. So at the end we would have got some total. We have got a total, so we have gone through the whole list now, what do we get? So I think the whole list there were 17 boys and 13 girls and the sum of the boys was 1220 and the sum of the girls was 951. So now what we wanted to do was find the averageů Average. So we need to divide, so if we actually divide the left hand side you get some 71.76 and this is some 73.15, so 1220 divided by 17, 71.76, 951 divided by 13 is 73.15. So it looks like the girls are doing marginally better than the, not that much 73.15 is just a little bit more than 71.76, so the girls are doing better but one, could not because it is small number 13 and so on you cannot really say much, but slightly marginally better than the boys. But what is interesting is that we were able to do this now in one scan, so we were able to iterate this set of cards once and keep track of these four outputs Four variables. As we were going along, so we were kind of doing a filtering on the gender in this case and we were accumulating both the number of cards of that gender and the maths marks total of that gender and finally we got all the totals and we were able to thenů So would you say that I mean we had this original question, are the girls doing better than boys, which seem like very interesting question, I mean it is kind of question that we normally ask but we have been able to turn that into a procedure, an iterator with some filtering and some variables and keeping track and all that and we have been able to give an answer which is numerically Yes, that we can justify. Justify, so would you say that this is an algorithm that we have been able to take this problem, are girls doing better than boys and we have found an algorithm for it, would you describe it like that? Yeah, that is a good way of thinking about it, so we have taken a question which is say subjective which you can debate about and we have given a systematic procedure on using the data available and a way to calculate and either confirm or to deny the hypothesis. So if we assume that girls are doing better than boys, then this is one way to validate whether this is true or not true, so that is true so this is very interesting that you can actually take a question which looks somewhat vague and make it precise by giving a criterion for that and giving an algorithm to compute that. Okay. Very good.