4302019 Chomsky UCLA Lecture 2

welcome everyone to the second lecture this week by Noam Chomsky and I already introduced him yesterday so you don't need another introduction today let's start off no welcome well - I'm impressed I was told that everybody would leave after the first lecture well to recapitulate briefly of the many different ways of in which one can approach language we've focused on one selected one particular one what's been called the generative Enterprise which is concerned with what's been called the basic property of language namely that each language constructs in the mind an infinite array of structured expressions each of which has a somatic interpretation that cap expresses a thought each of which can be externalized in one or another sensory motor system typically sound but not necessarily as I mentioned this approach revives a traditional conception goes back to the 17th century went as far as Earl early 20th century the last representative was out oh yes person then forgotten then revived without knowledge of the tradition well within within the generative Enterprise we also have settled on a particular approach which is not universal not required namely what's called the bio linguistic program that is regarding the language a language that a person has as a trait of the person property of the person sort of like the person's immune system or the person's visual system like other such systems it is a particular instance of something more general there is a general mammalian visual system which is innately determined there's a particular visual system that's the result of interaction with experience other factors that govern growth and the same is true of the internal language for clarification called I language to make it clear that there's a particular technical notion involved I fortuitously stands for internal individual and also intentional with an S namely we're interested for the purposes of science in the actual method by which this basic property is expressed not just any method that happens to yield the same external consequences so that's the approach within this approach there is a notion at once of what constitutes a general a genuine explanation a genuine explanation will have to meet two conditions the conditions of learnability the the system has to be able to be acquired by an individual and also evolve ability the innate internal system the Faculty of language has to have evolved so if device is used for a description that can't meet those two conditions that we have we don't have a genuine explanation we have a description descriptions can be very valuable it's much better to have organized descriptions than chaos it's short of genuine explanation now that's a very austere requirement and until recently it hasn't even been possible to consider it but it is the goal to be aimed at by any certainly any any approach within the sciences or rational inquiry generally well the early work from the fifties distinguished to different aspects of I language the term came later but doesn't matter the one of them was compositionality the other was dislocation for compositionality the approach the early approach that was adopted was very structured grammar for dislocation transformational rules it was assumed at the time and in fact for a long time up to the present and still is widely believed that the dislocation is a strange property of language you wouldn't expect it it's kind of an imperfection as it was called I have to kind of explain it away compositionality was taken for granted as I mentioned and we'll talk about today it seems like the opposite is true which is kind of an interesting result the phrase structure grammar had two fundamental problems which were already recognized by the 60s one is it simply allows way too many rules that alone disqualifies it can't have a system that permits all sorts of impossible rules the second problem which was more subtle is that it conflates three different aspects of internal language which really should be treated separately that's become clearer over the years one is simply compositionality structure having structured the core elements of the basic property the second is linear order which is not part of the basic property the third is projection that is determining what each structured element is what kind of an element it is the phrase structure grammar tries to handle all three and it's become clearer over the years that that's not the right way to proceed well the by the late 60s all of this was becoming partially clear and this approach turned to what was called x-bar theory to overcome the problems of phrase structure grammar well x-bar theory takes care of some of them it eliminates the problem of the main problem of far too many rules in fact there's basically no rules in an x-bar theoretic approach so we don't have all these impossible rules hanging around it also distinguishes order from structure so there's no order at an x-bar theory there is still the conflation of the projection and structure which only recently in the last couple of years was recognized to be a basic problem primarily because of the existence of the XS centric instructions which are excluded by x-bar theory although they're all over the place as I mentioned in order to sort of get around this annoyance special tricks were invented to pick out the right answer intuitively what is the right answer but without any particular basis so there's nothing an x-bar theory that tells you that if you have an NPV p structure you should regard it as essentially verbal rather than essentially nominal it's anything or is an inflectional element totally arbitrary and that of course is a very serious problem it's overcome I think by labeling theory which I'll simply assume you're familiar with I don't have any time to go into it but it's an approach which overcomes this problem doesn't and has furthermore the advantage of the accounting for when dislocation operations movement operations and may take place or must take place and when they have to terminate does it automatically so that's but that all put aside the transformation of the x-bar theory also sort of for although this was not recognized at the time it forces you to have a principles and parameters approach because there's you have the principle of x-bar theory but then you have to somehow get the order that's in particularly ages so English and Japanese sort of mirror images at the core you have to sum house that state that separately now there's an interesting and that led right off within a couple of years to general principles and parameters approach which raises interesting questions on its own where did the parameters come from do they have to evolve well if they have to evolve then we're very far from genuine explanations and in fact the simple case that's x-bar theory exposes points to what should be the answer so for the case of say where they have verb object or object or a border that has it has to be dis stated for each particular language but the answer to it has no effect on the semantic interpretation so it doesn't feed the conceptual intentional level which is suggestive it suggests that whatever the parameters and if you think about the other parameters that have been proposed say null subject parameter or others they they have the property that they are they do not enter into determining semantic interpretation so if you pick say Richie Keynes proposals about a basic subject-verb-object or versus a basic subject-object-verb order doesn't make any difference to the semantic interpretation all of that is very suggestive if the parameters don't enter into semantic interpretation we have to ask why that they are all together why don't you just have the afterall the core part of language is just expressing thoughts basically having semantic interpretations so it begins to look right off as if the parameters should be external to the core part of language and I will come back to that but it looks more and more plausible to me that that's the case what about their evolution well think again about the head barometer it didn't evolve it's just forced by the requirement of having to externalize in a sensory motor system that requires linear order so it doesn't evolve at all it's just a requirement there's a mismatch between the internal system and the sensory motor system and that mismatch just has to be answered one way or another but there's no evolution of the parameter and in fact the goal ought to be to try to show that all parameters of this property is very interesting work on this by the eks group Epstein kitahara and Seeley which studies studying particular array of dialects of cup parroting manager which they try to show that the differences between the dialects simply reduced to an unstated property of the set of principles it allows alternative orders of rules and turns out the two several dialects just pick different answers to this so for those that kind of approach can work for all parameters then the problem of their evolution has overcome and we go back to having a hope of getting genuine explanations well this is all these this is all kind of work in progress in fact not even in progress it's an open problem to be investigated to try to show that for for all parameters they all of the options that distinguish language they have the goal would be to have two properties one to show that they're simply part of externalization they don't feed the conceptual intentional level semantic interpretation and secondly that they don't evolve at all there are simply options left open by core grammar which have to be settled one way or another so language each language picks its way of settling them that would be the perfect answer as you can find it well x-bar theory partially dealt with these problems it still left the problem of conflating projection and structure and it had the huge problem of excluding excess entry constructions at which as I say abound so that's no good the the principles and parameters approach that develop through the mainly through the 80s led to a huge explosion of the empirical work on a very wide variety of typologically varied languages I think more was learned about language in the 80s and in the preceding couple of thousand years also there were conceptual steps forward but still there was the problem of not being able to reach genuine explanations well by the early 90s it appeared to number of people myself included that enough had been learned so that it might be possible to tackle the whole general problem of language on more principled grounds that is to actually seek genuine explanations to formulate what was called the strong minimalist thesis which says that language is basically perfect can we stow is called approaching universal grammar from below starting by assuming let's see how far we can get by assuming a perfect answer that would be a genuine explanation and how much can we explain this way now that's what's been called the minimalist program now the way to proceed with it is to start with the simplest computational operation and just let's see how far you can go where do you run aground well the the what you have to do is you have to explain the actual phenomena of language on the basis of the simplest computational operation that would be the genetic element plus other factors that of course enter into acquisition of language and there are several such factors one of course is the data with which the child is presented so we don't all speak Tagalog it's gotta be something in the data that picks out which language we end up with and it should be the case that the that the child the correct theory when we converge on it will show that the that very limited data suffices to fix the I language there's an empirical reason for that that is that the as psycho linguistic evidence studies child studies have increasingly shown the basic properties of language are really captured known quite early with very limited evans available there are other factors sometimes called third factor properties just general principles of growth and development which are independent of language may be natural laws that would for a computational system like language the natural one to look at is computational efficiency just assuming that for the reasons I mentioned last time that nature seeks the simplest answer so computational efficiency plus the simplest computational operation plus whatever contribution data makes that should yield the I languish attained that's the goal this program has been called the minimalist program so where this can be achieved you do have genuine explanation the if you can if you if you can achieve if you can explain something simply on the basis of the first and the third factor namely universal grammar of the simplest computational operation along with say computational efficiency if anything can be explained in just those two terms you do have a genuine explanation you've solved the learnability problem because there's no learning okay that's the optimal solution to the learnability problem you you've also addressed the evolvability problem in the best possible way if you think it through the basic property is simply a fact it's a fact about the faculty of language that it satisfies the basic property it therefore follows there has to be some computational operation that yields the basic property and whatever computational operation it is it's going to incorporate the simplest computational operation and optimally nothing else or not much else so if you can reduce something to the simplest computational operation you have solved the problem of evolve ability even if you don't know the exact answer as to how this rewiring of the brain took place you know this must have happened okay so you're free and fair so whenever you can account for something on the basis of just the first and the third factor universal grammar and third factor properties like computational efficiency if you can do that you certainly have a genuine explanation you could have more complex genuine explanations if you bring in the effect of the whatever the right theory of determining the I language from data is when that factor the second factor is introduced you can have richer kinds of genuine explanations but the simplest kind will simply reduce to the simplest computational operation plus computational efficiency well the notice of course that the innate properties the simplest computational operation these things may be triggered by data by evidence but that's normal for innate properties I mean you don't grow you have to have triggering evidence to get the system to start functioning so in the case of the visual system for example it's known for the mammalian visual system including humans that unless you have the right kind of stimulation in very early infancy in fact the first couple of weeks pattern stimulation of certain kinds then the visual system just doesn't function it's not learning how to function it knows how to function but something's got a set off the system and operation that's true for innate processes all together so presumably true for language as well so for example if a child was raised and I halation with no noises the Faculty of language would collapse presumably there's in fact even some evidence for that for odd cases unfortunate cases that have been discovered but apart from that it seems that we can we can we can at least begin by asking how far can we get to genuine explanation by just looking at first and third factor well the so what's the simplest computation possible well the simplest computation which is found somewhere buried in any computational system that just takes two objects that have already been constructed and creates a third object from them in the optimal case the operation will not modify either of the elements that's that is operated on that'll leave them untouched and it won't add any additional structure so it just takes two objects column X and y forms a new object Z which doesn't affect X doesn't affect Y and doesn't add structure that's basically binary set formation so the simplest operation which has been called merge in the last recent years is simply binary set formation and the question we can ask is if we can account for something in terms of merge and third factor properties like computational efficiency we have in fact reached genuine explanation for the first time well if you think about the operation merge binary set formation it has two possible cases so we have x and y we're forming the set X Y one case is that x and y are distinct ok second case is that one of them say X is an element of why it's part of why a technical term is a term of why we defined the notion term to mean X is a term of y if X is a member of Y or it's a member of a term of y normal inductive definition okay irreflexive so if one one possibility is that X or Y doesn't matter is a term of Y the other possibility is that they're distinct there there's a third imaginable case namely that they overlap but if you think about the process of construction of binary sets you can't have that case so that case we can roll out so we have basically two options the names that are used for them are external merged for two distinct elements and internal merge where one is part of the other the linguistic analogues would be for example forming read books from read and books that's external merge internal murders would be forming what did you see from you so what notice what you actually form is what you saw what okay you take what merger to you so what you get what you saw what okay that's critically important that's internal merge internal merges dislocation but this location is no a strange word because you're actually keeping the original element ok that's internal merge external merge just takes two separate things and brings them together notice that these two are one operation merge they're not two different operations these are just the two possible cases of the single operation there's a lot of confusion about this in the literature so you really want to be careful about it well just to be clear the minimalist program is a program it's not a theory right there's been a lot of you read in the literature discussions of whether the minimalist program is true or has it been refuted or that's meaningless it's a program of research that says asks how far can we get towards genuine explanation okay that's the program as I mentioned last time by now it has some in recent very recent years some independent motivation from what's very recently been discovered about the conditions of evolution of the language faculty to repeat quickly it's now been established by genomic evidence that the language faculty was established prior roughly at the time of the art of the appearance of anatomically modern humans and there's no evidence for symbolic activity before then so we have an indication not a proof but an indication that probably what happened is very simple and apparently hasn't changed since since the language faculty is shared among first of all among the earliest separating groups but in fact as far as we know among all human groups so we have independent reason to believe that it's that the program is a plausible program it's likely that the right answer to what the language faculty is will in fact be something like the simplest computational operation well how far can we get on this it seems to me that the last couple of years there have been pretty substantial achievements which give an indication of how far we might be able to go one achievement is cases where you have reached general like genuine explanation I mean a 1 is simply the unification of the two fundamental properties of language that were recognized back in the 50s composition and dislocation if they can both be unified in merge I think the first person to point this out was he said kita Hara who's sitting over there if you can bring these together it's a pretty substantial opera Cheeseman these were always thought to be two totally separate parts of language but maybe they're just the same operation in fact the simplest operation so maybe the null hypothesis the perfect case is that you have both dislocation and composition actually you can push that a little farther if you think about the two cases of merge internal and external which one is the more primitive the simplest the one that involves least computation actually internal merge if you think of external merge requires search you want to bring two things together that are distinct you have to search all the available possibilities well what are those first of all it's the entire lexicon say maybe fifty thousand items for normal knowledge plus anything that's already been constructed this is a recursive procedure so you're constructing things along the way and any of them can be accessed for further operations so external merge involves a huge search procedure internal merge in contrast involves almost no search at all you're just looking at a particular structure you're taking a look at its terms that's it so internal merges by far the more primitive operation external merge is much more complex which is an interesting conclusion because the opposite has always been assumed it's always been assumed that compositionality is the obvious case we explain that the dislocation is the complicated case the imperfection the strange property of language it turns out when you think it through that the opposites the case internal merge dislocation is a simplest case which have to ask is why do we have external merge at all why not just keep to the simplest case well think about that for a minute I'll come back to it in a second but that's what you discover when you begin to think about the nature of the operations incidentally there are some independent reasons for this not proofs but suggestive if you think about internal merge alone so imagine the simplest case you have a lexicon of one element and we have the operation internal merge I could write this down but it's simple enough so you can figure it out in your heads so we have one element let's just give it the name zero okay we internally merge zero with itself that gives us the set zero zero which is just the set zero okay so we've now constructed a new element the set zero which we call one then we apply internal merge again we take the term of the set zero that's zero we merge it with the set zero we now have a new element consists of two elements zero and the set containing zero we call that one two we just keep going we've created the successor function and in fact if you think about a little more and we've also created addition because if you take one of these longer terms and you merge that you're essentially getting addition multiplication is just repeated addition unbounded repeated addition so we basically have the the foundations of arithmetic just from internal merge that raises a very interesting question a question there is a question that very much troubled Charles Darwin and the co-founder of evolutionary theory Alfred Russel Wallace who were in contact they were both concerned with what appeared to be a serious contradiction to the theory of natural selection namely the fact that all human beings understand arithmetic if you think about it arithmetic has never been used in human evolution except in a very tiny period tacked on at the very end but for most for almost the entire history of human beings nobody bothered with arithmetic that you might have small numbers you might have what's called numerosity knowing that 80 things is more than 60 things but that's quite different from arithmetic knowing that the numbers can go on forever knowing what the computations are of say addition and multiplication so where did this come from Darwin and Wallace simply assumed that all humans have it by now there's pretty good evidence that they're right that it's even if you find indigenous tribes that don't use have no number words beyond maybe 1 2 and 3 and do handle numerosity it turns out that as soon as they were inserted into say market societies they immediately compute really they have the entire knowledge and in fact on conceptual grounds we know that this must be the case because there's no way to learn there's no possible way to learn that the numbers go on forever you can't have evidence for that but every child and just knows it you know so the problem is where'd it come from since it was not selected and its universal looks like it contradicts the fundamental ideas of the evolution well a possible answer to this is that it's just an offshoot of the language faculty the language faculty did emerge it's here we're using it so it's around the language faculty at the very least in its simplest case gives you arithmetic so it's quite possible that the reason we everyone knows arithmetic is just we've got the language faculty so therefore we have the simplest case of it now on the surface it looks as if there's counter evidence which has been brought up quite often namely dissociations kind of things that's Susan Curtis discusses there are dissociations between arithmetic 'el competence and linguistic competence so there are people who have a perfect language faculty in count of arithmetic and conversely however it's not clear that that's a counter argument for reasons that were pointed out a couple of years ago by Luigi ritzy a great language to most of us know he pointed out that the tests for dissociation or testing performance their testing what people use not the internal knowledge so as for example there are also dissociations between reading and language competence there are people who can have normal language competence but can't read conversely you know it can't be that there's a separate reading faculty that's impossible so in fact what it shows is just that the tests are studying the utilization of the competence they're not testing the competence itself so it may very well turn out that this is the answer to the Darwin Wallace problem a deep problem the illusionary theory going beyond there's other very suggestive evidence marvin minsky one of the found of artificial intelligence for a good mathematician about a couple couple decades ago actually in the sixties he and one of his students experimented with the simplest Turing machines the ones that the Turing machine is just you know an abstract computer basically the ones that have the smallest number of states and the smallest number of symbols just asked the question what happens if you just let these things run free and they got an interesting result most of them crash they either get into infinite loops or they just terminate of the ones that didn't crash they gave the successor function which suggests strongly and then Marvin went on to draw the plausible conclusion he said probably nature in the course of evolution will pick the simplest thing so it'll pick the successor function so he was thinking of things like extraterrestrial intelligence he said we can expect that if you ever find anything like that will be based on the successor function because that's the one that it's the simplest one well that's internal merge so from quite a number of points of view we have a conclusion that dislocation is probably the primitive element of language composition is the one that has to be explained so why do we have external merge all together well if you think about suppose we didn't have external merge at what would language look like actually it'd be nothing but the successor function because that's all that internal merge can produce the successor function press addition so if you had say say ten elements in the lexicon instead of one and used only internal merge what you would get is ten instances of the successor function nothing else but the language has other properties for example it has argument structure theta theory it has things like agent patient and so on well you can't get that from internal merge you're required to have external merge to develop the structures which will yield argument structure theta structure so an empirical property of language namely that it has argument structure forces you to use this more complex operation of external merge another reason for it is again the existence of EXO centric instructions like say subject predicate you can't get those by internal merge of course so there's just empirical reasons that force language to include the more complex operation of external merge notice that this turns everything that's been believed in the generative enterprise and the tradition turns it on its head that's exactly the opposite of what's been assumed the basic operation is dislocation internal merge arithmetic essentially then because of special properties of language like argument structure EXO centric instructions you just compelled to use also the more complex case of external merge but again notice that these are both the same operation just two cases of the same operation one more primitive than the other you're forced to use also the less primitive one compositionality because of properties of language well notice again that in the case of internal merge you not only get this location but you have several copies of the element that has been dislocated if you those of you who know how this works know that you can have a great many copies you can have successive cyclic dislocation which leaves a copy in every position with consequences for externalization and interpretation well that's very important because of a phenomenon known and the linguistic literature is reconstruction namely the dislocated element is actually interpreted in the position from which it came and even intermediate positions so standard examples are things like the sentences it is his mother that every boy admires so meaning every boy admires his mother it is mother that admires every boy doesn't work every does not range over the variable his and the reason if you think about it is that the copy' what's reaching the mind actually has the phrase his mother and the position in which a quantification works every boy admires his mother is actually reaching the mind although not the ear there are many more complex interpretations examples of this there's no time so I won't go through it but if those of you familiar with the literature no you get quite complex intricate examples of the phenomenon of reconstruction now in a lot of the literature the modern literature there is an operation of reconstruction that somehow takes the kind of structurally highest element and reinterprets it back in the initial position but that's unnecessary because that comes automatically it's just there okay so what reaches the mind is the entire the the what's called the chain of elements all of the elements that have been dislocated but it doesn't reach the ear what reaches the ear is just cuts out all but one of them actually that's interestingly it's not entirely correct we'll come back to it in a minute but put that aside for a minute I'll return to it in one second and notice that when you when you look at the these cases you have a strengthening of the conclusion that was already suggested by x-bar theory namely that the core language of the system that just reaches the mind that gives semantic interpretations that pays no attention to the way the thing appears on the surface okay it just asks what happens on the internal operations things may appear on the surface like linear order or missing the copy but the mind doesn't care because it's only paying attention to the internal operation and that raises the question why does linear order even exist what why do you have it well trivial answer to that the sensorimotor system requires it partially if you use a different sensory motor system for externalization like sign language it doesn't have strictly linear order the reason is visual space provides options that acoustic production does not have you have to speak one thing after another but if you have visual space you can use positions in it to convey the meaning and that's in fact done in sign language so an African sign language works by picking us point in space and referring back to it you can't do in spoken language you can also have simultaneous operations inside language like raising your eyebrows while you're signing which turns something into a question can't do that in spoken language so the nature of the engine general the nature of the sensory motor system is determining properties of language in the broad sense which are not part of core language at all they just reflect the characteristics of the sensorimotor system well now there's something that came up in the discussion last time I'll repeat it the sensorimotor systems have nothing at all to do with language okay these were in place long before in fact the millions of years before the scent the language ever evolved so they've the time when language evolved maybe roughly a couple hundred thousand years ago the sensory motor systems were in place independent of language they haven't changed with the usage of language they're still what they were you know the origins of Homo sapiens with the most extremely minor variations there's no indication of any adaptation of sensory motor systems to language so the early homo sapiens who were who developed this internal system by whatever means they did some rewiring of the brain which gave the simplest computational operation they had the task of getting this stuff out somehow and they had to use the sensory motor systems that are around as we do as every infant does and the sensory motor system imposes conditions but those are not linguistic conditions they're not strictly speaking part of language from which we can conclude it's contentious claim as Hilda pointed out last time but it looks from this kind of argument as though things like linear order or elimination of the copy I just don't have anything to do with language they're part of their conditions imposed by the mode of externalization which is language independent and as I mentioned less time this has the paradoxical conclusion that just about everything that's been studied in linguistics for the last 2,500 years is not language it's the study of some amalgam of language and sensory motor systems but if we want to really think it through the internal language the thing that's computing away and giving you somatic interpretations that doesn't have any of these things it doesn't have operations of reconstruction doesn't have linear order just operating along giving you producing thoughts in your mind which you can then try to externalize somehow well in fact if you think about it that's even true of what's called inner speech so almost all the time say 24 hours a day you're basically talking to yourself you can't stop it takes a tremendous active will to prevent keep you from talking to yourself but if you think about inner speech its externalized speech you're not for example it has linear order or you can think about two sentences you just produced in your mind and you can ask do they rhyme are they the same length do they have copies and so on answer is they don't so what we call in our speech is very superficial it's not what's going on on the mind all of this has a lot of implications for the study of consciousness and pre conscious mental operations almost everything that's involved in language seems to be inaccessible to consciousness it's prior to whatever reaches consciousness even in our speech a lot of implications to this which are worth thinking through well a conclusion that kind of begins to appear on the horizon is that core language the internal computations yielding linguistically articulated thoughts a core language could Universal it's possible that it's just common to the species it's part of the Faculty of language you know which would solve the learnability problem because nothing would be learned if that's true and it would solve the evolvability problem insofar as you can reduce the operations to the simplest computations okay so that's a kind of a goal that you can see formulating itself on the horizon something to strive at in research you would also anticipate if this is correct that the variety of languages is just an apparent it's just a surface phenomenon it's it's not really a fact about language I mentioned last time that in the say in the structuralist behavioralist period it was assumed almost as a dogma that languages can differ from one another pretty much arbitrarily and that each one has to be studied without any bias any preconceptions that was what was called the boolean doctrine if this is correct that's just completely false oh there's only maybe even just one language a lot of variation that comes from mapping it into externalization systems you would expect the same to be true of the complexity of language the relating to independent systems is a complex affair so you could naturally expect the complexity of language to reside there also just the fact that languages change very fast in fact every generation there's language change but could be that this is just externalization anyway those are goals to be thinking about kind of coming into view when you just think about the nature of the problem well under externalization as mentioned before the mind hears every case but the ear only here is one case okay so when you say what did John eat say it is his mother that every student admires you don't hear it is his mother that every student admires his mother that's hitting the years so you know how to interpret it but it's a theme of the mind but it's not hitting the year and that's in general true actually there's some very interesting qualifications to this there's an interesting study by it's known that in some languages the initial position that you don't hear in this location actually has some kind of a phonetic mark some sort of a sound that indicates I was here you know there's a study by Vietnamese linguist to a trend who goes into this in some detail and relates it to pers otic properties of the language which pretty well determine when this is the case more intricate for those of you who know this stuff and success of cyclic movement when you have successive dislocation to higher and higher positions there are in many languages residues strange residues along the way that's true at both of what are called phases CP level and the VC P star P level you get in some languages traces that something went through here so there's some indication in some languages in the externalization that there was something there but overwhelmingly you just don't hear anything you hear one case typically the structurally highest case so there are although what are called in situ languages like say Chinese Japanese it's you hear it in the base generated place the base of the place where it started from actually that's true in languages like English too so you have in situ constructions and English and similar languages in very special circumstances which are interesting but overwhelmingly languages very this way so the question that arises is why don't you pronounce everything why bother deleting a lot of them well here we turn to a third factor property computational efficiency suppose you were to pronounce all of them notice that the things that you could print I gave a simple example what did you eat but instead of what it could be of arbitrary complexity so you know which book that appeared then 1690 and then was destroyed did you read let's say if you repeat that you've got a lot of computation going on that mental computation to set the stage for phonetic articulation and then motor control to articulate the whole thing so you have massive reduction in complexity if you don't want it just don't pronounce these things okay so computational efficiency compels of the deletion of the copies you can't delete them all of course or else there's no indication at the operation ever took place so you have to leave one at least typically they structurally highest one then the rest you delete well that's fine from the point of view of meeting the condition of computational efficiency but it has a consequence for language use causes problems for perception serious problems in fact those of you who've worked on mechanical parsing programs know that one of the toughest problems is what's called filler gap problems you hear our sentence which begins with which book then comes the whole sentence and you have to find the place where which book is interpreted but there's nothing there that's called a filler gap problem these can be pretty tricky I mean it's simple in the case that I but if you think through more complex cases can become quite intricate so the fact that what we have the following strange situation a computational efficiency is forcing the communicative inefficiency okay computational efficiency is forcing parsing problems perception problems communication problems what does that suggest it suggests that the way language evolved them is designed it just doesn't care about efficiency of use it's of no interest it just cares about efficiency of computation now there's a lot of evidence for this and many things that are quite familiar to us and nobody bother thinking about but take things like structural ambiguity you know examples like flying planes can be dangerous which can mean either the act of flying planes can be dangerous or planes that fly can be dangerous there's lots of structural ambiguity in language where does it come from well if you look at the cases it comes from just allowing the rules to run freely not caring whether it causes perception problems or not but all structural ambiguity problems are perception problems you have to figure them out take what are called garden path sentences discovered by Tom bevor sentences like the horse raced past the barn fell when you present this to people they are confused what's fel doing there you know the horse raced past the barn was a sentence it's called a garden path sentence it leads you down the garden path but if you think about it it's a perfectly grammatical sentence it could be the horse that somebody raced past the barn Phil okay well these these are all over the place in fact incidentally they all if you run the Google parser on them you know the one you can pick off the Internet all the time on these but but they're normal parts of language and again these are just cases where the rules run free and they don't care whether it causes problems or not actually the most interesting case too complicated to go into but many of you familiar with it is Islands things where you just can't extract you can think them but you can't produce them the cases of islands that are understood there's a lot of problems with them all turned out to be cases where you have computational efficiency forcing the island compelling a paraphrase of constant often a complex paraphrase to somehow get around it well the general conclusion seems to be that as far as when as far as I know whenever there's a conflict between computational efficiency and communicative efficiency computational efficiency wins the language just doesn't care about communicative efficiency that also has a consequence there is a very widely held dr. and almost a Dogma that somehow the function of language you know how it evolved its basic design and so on is as a system of communication and apparently that's not the case apparently language just doesn't care about communication of course it's used for communications for a useful but it that just seems to be something on the side it the probably the background for this doctrine is first of all that we do use language for communication all the time so on the surface that's what you see and also a kind of a belief going back to early Darwinian notions that anything that evolved had to go by small steps from earlier things so somehow language should have evolved from animal communication systems but it seems to be just entirely false the language seems to have come from something totally different having nothing to do with communicative systems well a lot to say about this but time is running out I think a my I've lost the track of time what am i okay well there are if you let me just go on a little bit because there's a lot I want to get to there are other genuine explanations some of them pretty striking for non-trivial properties of the most interesting one of all I think is what's called structure dependence it's a strange property of language puzzling property notice years ago that the operations of language simply don't seem to pay attention to linear order they only pay attention to structural relations so standard example and the literature is things like the man who is tall is happy if you turn that into a question it becomes is the man who is tall happy not is the man who tall is happy okay that's obvious everybody knows that well as soon as generative grammar began the question arose why why don't you use the simple computational operation of picking the first occurrence of the auxiliary and putting in the front it's a much simpler operation than the one that's used the one that's used in fact picks the structurally closest one so if you draw the say the tree the structure of this you see that's structurally the one that's picked is the one that's used similar take a sentence can animals that fly swim we understand that to mean can they swim not fly so it doesn't mean I have to read it because it's hard to work is it the case that animals that can fly also swim why doesn't it mean that it's perfectly fine meaning so why doesn't the sentence mean that that's what it would mean if you picked the first occurrence of ken and moved it or interpreted but it's not a mistake that anybody can possibly make any child makes let's say in fact it's by now known that by about 30 months old infants already are operating with structure dependence that's something that's just automatic you have essentially no evidence for it and this is true for all constructions in all languages as far as anybody knows well all these are things we take for granted but the usual question arises why why it's particularly puzzling because for two reasons for one thing we avoid the what looks like the computationally simple operation linear order that's trivial pick the first thing is trivial we avoid that we use a much more what looks like a much more complex operation a second reason it's puzzling is we ignore everything we hear what we hear is linear order we don't hear structure that's something the mine constructs internally so we ignore everything we hear we ignore what looks like the simplest operation and we do it universally and this is true for cases examples like is the man who is tall happy have been a little misleading they have led cognitive scientists to believe that maybe you could learn this because you could have something you have data for it and this huge literature trying to show that one way or another you might acquire this but there's plenty of cases with the evidence that just zero total of zero so take for example the sentence the guy who fixed the car carefully packed his tools some big ewis he can fix the car carefully or carefully pack his tool if carefully appears in front carefully the guy who fixed the car practice tools it's unambiguous you unambiguously pick the more remote verb not the closest verb evidence for that is zero there's no way you can learn that by deep learning of the Wall Street Journal corpus or something that really but this and this is universal and language so there is a puzzle and there's in fact an answer the answer is language is based on the simplest computational operation namely merge which has no order okay so the option of using linear order just isn't there in the use of and the internal core language so here's a situation where we have is there's no learning necessary because that's just innate there's no evolve ability problem because it's the simplest computational operation so you have a genuine explanation of quite a deep property of language strange property of language structure dependence no linear order there there happens to be confirming evidence from other sources in this case there are neuro linguistic studies the initiated by a linguist to the linguist here no Andrei immoral group in Milan who brain scientists who devised the obvious excellent at this study they took two groups of subjects each of which was presented with a invented nonsense language one of the languages was modeled on a real language that they didn't know ii used rules that violate universal grammar like linear order so for example in the second language the invented one the negation would be say the third word in a sentence okay which is very trivial to compute and the one modeled on a real language negation was whatever negation is which is pretty complicated when you look at it well what they found is that the group the subjects who are presented with the invented language modeled on a real language handled it with activation in the normal language areas of the brain book mostly Broca's area in the case of the subjects presented with the language say in which negation was the third element they didn't get Broca's area activation they got diffuse activation over the brain which indicates this just being treated as a puzzle not a language problem similar evidence was presented by a have studied people again the linguist known Neil Smith and he on theat simply who work with a particular subject who's has very low cognitive capacities extremely low but has remarkable language capacities picks up languages like a sponge basically and that when they presented him Chris they call him with the invented language based on a real language learned it very quickly when I presented him with the one that has say negation in the third position he it was couldn't handle at all it's a puzzle he can't deal with puzzles so here's a very striking case a rare case where we have the optimal explanation for something perfect explanation genuine explanation fundamental property of language we have confirming evidence from all the other relevant sources language acquisition is known right away neurolinguistics you get the right kind of brain activation psycholinguistics get associations you couldn't ask for a better result now here's a very curious fact about the field of cognitive science altogether there's a huge literature in fact one of the major topic in computational cognitive science is trying to show how this could be learned somehow from data well how it could be learned without what's called inductive bias for structural for structure now you think of it if you look at this literature first of all the methods that are proposed if there are all clear clear enough to investigate they all of course fail but a more important fact is it wouldn't matter if they succeeded so suppose that you could could say take the Wall Street Journal corpus and use some deep learning method and figure out from it that speakers of English use the structure structural rule the significance of that would be absolutely zero because it's asking the wrong question the question is why is why is this the case in fact why is it the case for every construction at every language so even if in conceivably you could show how it could be learned wouldn't matter okay it's a totally irrelevant it's the wrong question what about the inductive bias there's no inductive bias it's just the simplest explanation so it's as if this entire effort which those are you know the literature will know that this is a huge part of computational cognitive science but the entire effort is an effort to try to disprove the null hypothesis now if you think about it things like that just don't happen in the sciences it's just not the kind of thing you undertake like if there's a perfect solution for something you don't try to find some complicated way in which you could have gotten it some other way it doesn't make any sense raises serious questions about the nature of the field it's a strange departure from Sciences there are other successes but starting next time I want to turn to something else problems with merge the reasons places where it really doesn't work and what we can do about this [Applause]