512019 Chomsky Lecture w QA

welcome everybody to the third lecture in Noam Chomsky's lecture series for this UCLA linguistics mini course there will be a question period after the talk and without further ado I give you Noam Chomsky well yesterday I talked about some of the things that work out some of the successes and finding genuine explanations term that I had discussed that's actually something really new in the history of the subject till recently even the concept of genuine explanation wasn't well understood and there were no examples of it but one of the striking things of the past few years is that there are some genuine explanations in the sense that I discussed for some quite fundamental properties of human language and there are some rather surprising consequences which overturn long-held beliefs well that was the successes today I want to talk about some of the failures some of the things that don't work and what they imply and some of the ways around them and we'll see that those two have some curious and unexpected and rather interesting consequences so let's start with we'll start with x-bar theory x-bar theory had some successes some failures one of the successes was that it eliminated the major problems of the very structure the fact that very structure grammar permitted vastly many rules that are totally unacceptable that couldn't be tolerated also free structure grammar conflated three distinct properties of language compositionality linear order and identification of the categories projection an x-bar theory a separated linear order from the other two which has lots of consequences that I discussed last time there were problems the problems were to one it's still conflated two distinct concepts joining things together and identifying their category and secondly in even more fundamentally it ruled out excess entry constructions which abound in language they're all over the place and that required all sorts of artifice is to try to figure out how to get what you knew was more or less the reasonable answer but without any justification for it so we want to get rid of that and the way to get the simplest way to get rid of that the most principled way is to assume the strong minimalist thesis and see how far you can go from there that means you take the simplest compositional operation the one that has to be embedded in any computational procedure and see how much can be explained in those terms and I explained discussed some of the successes last time however there's an immediate problem what about excess entry constructions so suppose you're constructing a subject subject predicate construction they have to be done in parallel you have to construct the verb phrase and the down four is before you can merge the two together that means you have to have a workspace in which things are constructed and the workspace will consist of first of all the atoms of computation call them and anything constructed by the rules so the workspace is just the set including the atoms and anything constructed by merge four so that means this I technology so when actually when you're let's call this capital merge because it's different from the concept has been used in recent years so merge actually is going to take two things and put them together Coulomb P into and the work space it's an operation on the workspace which is going to change broader operation and that's going to you as a notation that the workspaces are set okay the set which includes everything that's been constructed but it's a different set different kind of from the sets that are formed by merge it's not accessible to computation you don't remember the workspace of things so just to indicate that all use a different notation I'll use square brackets for the set that is in fact the workspace the result of this will be merging P and Q and then a bunch of others and that will be the resulting work and that raises the question what's here okay so what's the rest of this stuff well let's take the simplest case the case where the workspace were these are the only elements would work the simplest case is a workspace that consists of the hue and then yes what is that you so it yields said you and what else well if this was normal recursion the kind of recursion that goes generally applies generally the rest of the workspace would include P and Q and so for example if you're doing a proof theory general recursion you start you have axioms you have rules of inference if you apply the rules of rules to either the axioms or some theorem you've already constructed you get a new theorem and what's left to what is accessible to further operations is everything that's already been produced the axioms and every previous theorem so that is essentially saying you include the set PQ you've just formed and on P and Q however that fails for human language and it fails in a very straightforward way for example if you were to do this then you could take this guy here you could turn it into something of arbitrary complexity violating every imaginable island any other property and then you can merge this you would now have violated every linguistic property all islands would be violated no matter how radical they are so that tells us that there's something differ about recursion in human language and general recursion we have to exit to bar the things that were already there they can't remain in the workspace that raises interesting questions the actually I should say that the original definition of merge back in 1995 actually incorporated this property but without recognizing it that it was really if you look back at it it was an operation that said replace said replace P and Q by the set P Q and the formalizations of this if you look at them did add an operation that operation remove so get rid of things that you've already had that we don't want we don't want to have extra new operations it should be that the right concept of capital merge just has this property but that raises the question why do we have that property why should language have the property that it deviates from normal recursion by getting rid of the things that had been generated and you're not allowed access and many more well--that's raises an interesting question we would of course like to reduce this to some general principle some prints of some essentially third factor property remember that in the study of growth and development acquisition of language in particular there are three factors you have to look at one is Janette what's genetically determined that's ug second is whatever the data happened to be and the third is independent principles that hold independently of the operation in question that would include laws of nature and special things about the system that is carrying out the operation so some properties of the brain in our case which make it language ready that's been noted for a long time but nothing was ever said about properties of the brain about which very little is known in fact but what we would look for in this case is some property of the brain that forces this that would be a third factor principle that would be the optimal conclusion before coming to that let me see notice that there are already embedded in linguistic theory as it's developed over the past years some examples of this one case is that emerge in the old sense never never increases the work space the only way to increase the merge itself doesn't increase the work space unless it happens to be bringing in a new lexical item okay so the old definition just didn't allow arbitrary expansion of the work space more interestingly conditions like the phase impenetrability condition I apologize in advance to the number of faces in the audience I'm going to assume familiarity with recent linguistic work with no time to go into the details but the phase impenetrability condition which is a fundamental condition with many important consequences actually has the property of reducing accessibility to computation says there things you've produced you can't see anymore okay that restricts the resources available for computation another case is simply C command if you think about successive movement of something we we take it for granted that if you move if you're doing successive cyclic a bar movement let's say the next move will pick the top element not the lower elements okay we what that means is you're using essentially minimal search they're using a computational principle third factor principle that says look for the shortest thing and don't go any further well again that reduces accessibility but we need something beyond we already have a number a number of things have already been proposed that remove the reduce accessibility and hence the limit of the richness of computation but we want something beyond that and actually there is something highly suggestive of property of neural computation which has been noticed elsewhere the main of our striking principle of computation is that the neural computation is that the brain is very slow it's a very slow computer and that shows up in pretty dramatic ways I'll quote from Sandow a Phung cognitive neuroscientist who's a colleague at Arizona quoting him the marvel that we call the human brain is actually the weak link in our cognitive apparatus our sensory apparatus far outstrips the brain's capacity to process the high-resolution input to the sensory organs the eye responds at a single photon level the minimal possible detection the eardrum vibrations respond at a level smaller than the diameter of a hydrogen atom so not for everybody incidentally I'm an exception but but in principle in principle if you think about that it's pretty remarkable and it's the same for every sensory system they're essentially perfect physically perfect the brain can't handle that so almost all of the computation that the brain is doing is throwing out in from the brain is incapable of handling the amazing capacity of the sensory system to be optimal physically optimal detectors better than anything that can be constructed it does raise a strange question that why did Mother Nature bother to produce these astonishing systems if the mate if the next step is to get the brain to throw out all the information that they produce that's not just true for humans all organisms it is it's a question you could ask why that happened probably there's some basis in physical law that explains unknown principle of physical law that explains that if you're going to have sensory systems you have to make them perfect but what the reason for that is is unknown it just is a fact anyway the brain is essentially throwing out everything almost everything that comes in which ends very tempting to think that the resource limitation that's part of language computation and presumably part of the computation that's done by any organism that it's just a special case of the fact that the brain us of slow and incompetent it wants to get rid of tons of information that's a plausible candidate for the conclusion that one of the properties of linguistic recursion as distinct from recursion in general is simply that it's got to work through the brain which remember the whatever the properties of the brain are that implement the linguistic computation is some third factor condition which is just proposed by language presumably by any organic computation well if you think back a step we've been trying all along in the concept of general genuine explanation to restrict the amount of computation it takes place that to find the simplest theory the one with the least fewest mechanisms least computation and there were good reasons for that which I discussed one just the general fact that simplicity of theory corresponds to depth of explanation which is what you're trying to achieve in any serious scientific invested in query secondly there's this Galilean precept which has been spectacular spectacularly successful about nature being simple and third in the special case of language there is the there are the empirical conditions on language evolution which we don't know for certain but seemed to show as I discussed that it was a very sudden immediate development which hasn't changed since and of course very recent and evolutionary time all of which suggests all which converged to suggest that computation should be restricted to the minimum here we have a second point not only should computation be reduced to a minimum but the resources available to computation the set of elements accessible to the operations that too should be reduced to a minimum so we seem to have a very broad principle of reduction of computation broader than normal explanation in the sciences which is not just our the means restricted to the minimum but also the resources accessible to the means have to be restricted to the minimum that has a lot of consequences one consequence is already mentioned tells us that the work space is going to be a small as possible well let me just make a comment on notation we want to say that this the workspace which is a set containing X is distinct from that we don't we don't want to identify a singleton set with its member if we did the workspace itself would be accessible to merge however in the case of the things produced by murders we want to say the opposite we want to identify singleton sets with their members this is actually something that goes back to ferry structure grammar infrastructure grammar you did he allow rules that nobody nobody said why but it's just kind of thing it didn't do it's not allowed for one thing would get you into infinite loops you don't want that so it was just kind of barred now there was a trick used in very structured grammar and illegitimate trick we tend to reduced VP so you had a verb that was a verb phrase that was only a V it would be described like this I'm using the pre notation the MP DP this guy that's barred by x-bar theory okay an x-bar theory you don't have any VP if it's only a V that was it's taken for granted and x-bar theory actually it has consequences for one thing for Ricci Keynes linear correspondence axiom LCA it was necessary to introduce VP if you want to get the asymmetry between a subject and a verb but that's illegitimate so it's a problem for LCA and there are other consequences carrying that over into the merge framework of what it comes down to is this so firstly in the case of the workspace we do not identify and a singleton set with its member in the case of something constructed by merge yes we do identify a singleton set with its member that has quite a number of consequences well let's this actually a simplest case that I just mentioned actually provides a criterion for determining what are the proper and improper forms of merge we have to ask for every kind every notion of merge that's developed every case that's proposed whether it yields illegitimate derivations if it does we have to throw it out so it takes something that I think nobody's ever proposed let's just imagine that and then we want to merge a and B so that's going to give us let's call this by X it's going to give us the result the set a B so suppose we there's nothing in the original definition of merge that blocks that it's just something nobody ever wanted to do because it doesn't make any sense but suppose you do do it then what happens well same story is this X can become arbitrarily complex that's bullet Y a join Y we violated all all conditions okay so this is ruled out because it allowed too much accessibility we want a definition of merge that restricts accessibility so you won't be able to get things like that well that was something that's never been proposed but let's take things that have been proposed and are very widely used in linguistic description like for example what's called parallel merge okay parallel merges usually described like this what to do is construct a tree notation at which the element B is part of BC and also part of a B that's called parallel merge we'll shouldn't make a comment at this point the tree notations are kind of convenient but they're very misleading and you should really pay no attention to them for one thing a tree notation it kind of leads you to suggest that there's got to be something at the root of the tree but that's conflating compositionality with protection projection and in fact you just often don't have things at the root of the tree for example every X eccentric construction okay that's what labeling Theory takes care of which eliminates that conflation another reason is that when you draw trees it looks easy to do lots of things that don't make any sense so especially with contemporary computer graphics you can draw funny lines from one point to another and connect those two things or you can tack something in low and lower in the tree and pull it late merge and so on and it seems to sort of work out by playing around with trees but if you try to translate it into merge it doesn't make any sense so let's take parallel merge what this really means is that you started with an a B this is the workspace now I've had C and then what you did was merge B with C which gives you a new workspace the u.s. crime which is the original and a new one and here we have the old problem of the problem is we can turn this into something as wild as we want and then we can merge this to it and we have a connection between these two which violates every possible condition so parallel merges looks as if it's easy to draw but can't be a legitimate operation because again it violates this paradigmatic problem which tells you that you're proposing an operation which is going to yield illegitimate objects objects that violate every imaginable syntactic condition so that has to be ruled out actually partly those of you know the literature know that parallel merge has been used to give lots of interesting results it is the basis for trying to construct what's called multi dominance you know constructions we have a mother connected to different things down below that's been used for cross the board deletion for lots of other things to give accounts of these things now those in the literature are cold explanations but they're not their descriptions they're interesting descriptions which have the property that I discussed earlier there are illegitimate ways to impose some kind of organization on chaotic data which is very useful often says well now it's more organized than before so we can kind of look and see if there's a possible way of accounting for it but it's not an explanation it's a way station on the way to conceivable explanation that so there's got to be something about the definition of merge that's telling you you can't create parallel merge and if you think about it for a second you can see what it is there are too many accessible things being produced okay the right definition of merge when we get to it should allow only one new accessible object namely the one you're constructing when you put P and Q together you're constructing a new object the set P Q that's accessible to computation but nothing else should be that should be the right definition of merge this one is adding two new accessible objects it's adding this one and that's too many the right definition of the computation should permit only the minimal number namely one new accessible object and again I suspect that that goes back to the general property of the brain that says it's just very slow it has to throw out information actually before going on I might mention that this is kind of familiar in other domains of language acquisition so when you study the acquisition of phonology in infants what you find is of course every infant is capable of all the thought the phonemic distinctions in any possible language but by about a year old or maybe even earlier most of them just been lost what the early stage of phonological development is doing is saying throw out all the possibilities and just leave us with these when you study critical periods and language acquisition it's pretty much the same now what you're saying is that these apparently several critical periods lots of stuff is just thrown out that you're not allowed to look at anymore that's what makes a critical period if you look at a general take a theory of language acquisition like say Charles E Ang's which assumes that the child just starts with all the possible I languages and as data comes along the probability distribution over the set of I languages changes okay so some data comes along and says okay I'll lower the probability this guy and raises the probability that one ultimately in the course of language acquisition it need not converge in a single point but it it gives you a skewed probability distribution in which many things are kind of way down there are you're not going to bother with them there's a few or maybe a couple of things that that are high probabilities you hang on to those and that's your I language you could again describe that in the same terms it's saying what the acquisition system is doing is simply throwing out lots and lots of data what we know the brain does massively because of the extraordinary of sensitivity of the sensory organs well anyhow a parallel merge goes and with it all of the interesting very interesting empirical consequences that have been produced that are based on parallel merge now those consequences remain that as puzzles not us things that have been explained as kind of organized data which now we want to look at take a look at say what's called sideways sideways merge the same problem I won't run through it but if you formulate side words merge not just as a tree you know with the line going from here to here but in terms of actual merge has exactly the same problem so this problem here the very simple problem this one turns out to be a very good diagnostic to tell you when some proposed concept of merge is legitimate or not it's very simple very simple idea and it cuts very deeply the sake let's take late merge well late merge has the same problem but an additional problem which was pointed out by eks Epstein era sealy namely late merge in the case of late MERS where you're sort of tacking something on down here somewhere one it has the problem I already mentioned but secondly it requires a substitution operation maybe this guy has to go back here how does that happen well you know when you draw three you just draw it there but that's not enough you need some kind of operation that says the new thing you've constructed has to go right in the position of what you've constructed it from it's a non-trivial operation but it's way beyond the bounds of SMT so late birch which has lots of consequences it's used very widely also can't be correct and in fact I think most of the late merge literature I won't have time to talk about this probably reduces to ellipses but I'll put that aside here just a problem to work now anyway all of these things are out all the consequences that follow from them there's lots of stuff in the literature very interesting consequences they remain as just puzzles well if you think about these proposals they do fall within the original loose and vague characterization of merge so it's natural that people should have proposed them they seem to fall within the original I won't say definition because it wasn't clear enough to be a definition but the original characterization of the idea and it's led to interesting results but misleading interpretations of them they are not results there are simply presentations of interesting data which we know have a problem about trying to figure out what well that suggests research program the research program is first of all to take all of the kinds of operations that have been proposed for merge ask yourself which ones are legitimate and which ones are not legitimate and then the second task is to formulate a new definition which captures just the legitimate ones leaves out the illegitimate ones and then the third part of the research program is to explain and why okay so those are the three steps Oh skip the first step there's a lot of material in the literature showing which ones are legitimate which ones aren't it runs right along essentially the lines I've just outlined it turns out that the only ones that are legitimate are the simple ones that you know we had in mind when it was first developed narrow external and internal merge all of the rest parallel or sideways merge or late merge other proposals have been made they're all illegitimate illegitimate on the very same grounds they allow the yield legitimate operations which produce illegitimate results hence have to be excluded and they all have the property that they're increasing accessibility too much so that's the first part we then want to formulate merge so that it'll block these things with what a definition that looks like this where we have to say something about this okay what do we have to say about it I'm not I'm going to skip the formalities it's easy to formalize all this but too much trouble you can do it yourselves just intuitively what it does is say that the the first condition that has to be met by this new definition is that anything that was nothing that was in the work space can be lost okay you can't throw out stuff from the work space so if something was there and it's not is distinct from P and Q because those were up but anything else that was there still has to be in the workspace okay so that's the first condition second condition is that the new definition of merge has to be minimal the optimal definition it has to restrict accessibility as full fully as possible now Oh merge itself is creating one new accessible object that's the point of merging you're creating a new object but it can't allow anything else to be accessible okay that's the second thing and the third is just we don't want any arbitrary junk around so not not a lot of other stuff that had nothing to do with the operation those are basically the conditions on the proper definition of merge that leaves us for the next step what's the explanation for it and here we have very strict way now we basically have the explanation the first point that says you can't throw things out is actually a special case of the no tampering condition the general Smt condition of minimal computation which as you can't modify the elements that are entering into the computation well the most extreme form of modifying something is to throw it out okay so you can't do that so that's the first condition the second condition is the resource constraint the resource constraint which I think probably reduces to a general third factor property of the nature of the brain the slow element and the cognitive system the thing that's hampering cognition the brain so it probably just goes back to that so we have the resource constraint which limits accessibility the third thing that says don't throw in any extra junk is actually a consequence of the restricting accessibility if you throw in anything else you're increasing accessibility okay so we therefore have an optimal definition of merge capital merge which meets all the conditions we want and we have an explanation for all of them so that's the set of flaws failures and a possible solution for all of them a plausible one well there's something else that we have to point out here I'm skipping a lot of things so doesn't matter there's another consideration that we have to keep in mind let's call it stability the general property computations which we don't usually bother to think about but we have to be explicit about it for reasons that show up when we try to explain things so for example if you're forming say a topical ization you say Mary's it's the optimal internal merge operation which doesn't delete anything when we pronounce it we drop the lower copy but that's again for reasons of computational efficiency but we're now talking about just what reaches the mind we're forgetting about what reaches the Year the extraneous stuff which doesn't really belong to language strictly speaking the something that's taken for granted about this but that we have to be clear about is that the two occurrences of Mary's book have to be absolutely identical in every respect so it can't be one mer up above who's the topic and another Mary whose book we're reading it can't be that the topic is the book that Mary owns and the copy the somatic object is the book that Mary wrote let's say that's simply taken for granted the interpreter system has to know when things are precisely absolutely identical so let's call that condition stability turns out to be consequences now that immediately begins to raise some interesting questions what's a copy and what's not a copy a repetition so you can pick things out of the lexicon which are totally different but happened to be identical in syntactic and phonetic form so you can say John saw John okay that's okay but those are two different John's they have nothing to do with one other those are repetitions on the other hand here when you've done internal merge you have copies how does the interpretive system know what's a copy and what's a repetition they all they look alike but it has to know them that turns out to be a non-trivial question there's an interesting paper by Eric wrote Chris Collins butts on I don't think it's been published but you can find it on Lynn buzz where they just go through a lot of the problems in the various efforts to explain this but you have to be able to explain it the interpreter system's got to know what's a copy and what's a repetition now assuming I'm assuming here phase theory for good reasons that means that at the phase level precisely at the phase level the the system has to know the distinction okay well of course at the phase level you do have the information about what was produced by internal merge and what was produced by external merge so you you can see something you have some information but it's not complete because there are cases of internal merge into internal to the phase like for example raising the subject to subject position inside the CP its internal to the phase but it's and at the phase level you know you don't see that it's just happened here so how come the interpretive system till to solve this problem that raises quite interesting questions and I think the basic answer to it is given by a general property of language which is sometimes called generally at interpretation of expressions they fall in it falls into two categories there's one category which yields argument structure theta roles and the interpretation of complements of functional elements okay basically argument structure there's another category which is involved in displacement which has kind of discourse oriented or information related properties or you know scope properties and so on but not argument properties okay that's duality of semantics you think it's about a little further you see that the first type argument structure is invariably given by external merge the second type non argument structure other factors is always given by internal merge now there appear to be some exceptions to this as usual when you produce a generalization you look at details you find maybe something didn't work if it's a strong generalization the proper approach just as standard science is to say the exceptions are probably misunderstood and make the strong generalization work always I think that's a good rule of thumb use constantly in the sciences we should use it here so I think duality of interpretation is probably a very strong principle the question is where does it come from well probably it's just a property of the nature of thought we don't know a lot about thought in fact one of the main ways in which we understand become to understand thought is to ask how it's learnt linguistically articulated that's one of the very rare avenues into what's the nature of thinking and if it's correct as I've been suggesting throughout that language really doesn't care about use communication that's kind of irrelevant the language it cares about expression of thought then we'd expect the design of language to capture these aspect these fundamental aspects of thinking okay that's what I'm proposing now if you think about duality of semantics you have a technique right away to determine what's a copy and what's a repetition if something is in a theta position it's a repetition if it's unless it's been raised in which case it's a copy of what's been raised if it's in a non theta position it's a it's a copy okay the phase at the phase level the system simply has to take a look and say what's a theta position what's in a theta position what isn't in a theta position that tells us what's a copy and what's a repetition that cuts through the mass of the objections and problems that columns and grote pointed out now there are with that if we restate that what we'll be saying is that the operation merge produces copies always both cases of merge external and internal merge they produce that produces copies nothing else does okay so a copy is just whatever is created by merge in the case of internal merge it creates two copies okay so external merge also creates two copies the original thing on the thing that's in the set you formed but the operation merged which is in effect a replace operation gets rid of the caught the first copy in the case of external merge so you never see it okay so merge always produces two copies but external merged just by the nature of the operation the minimal operation yields only one of them remaining okay that's the restriction part okay now that still raises some interesting questions if you start thinking of the complexity of constructions double object constructions topical ization small clauses and so on it looks as if there are ambiguities in searching for what is a copy of what on what we have to show is that language has a conspiracy that determines the answer in each case so for example in the case of double objects ver knows abstract case theory if you think about it uniquely determines the answer in the case of topical ization when you want to know what's been you have a topic up there you want to find out which of many things that look identical it comes from if you think about say Luigi Ritz is left periphery theory plus the theory of labeling which determines criteria l-- positions it'll solve that problem the case of small clauses is extremely interesting well if there's time there's not time today but if there's time tomorrow come back to it but there's an interesting way of resolving that ambiguity but here's a problem that I posed there has to be a conspiracy in the structure of language that automatically resolves ambiguities about what's copy of what okay so we have two problems about copies one what's a copy and what's a repetition that's solved by defining copy as just anything produced by merge and nothing else and by dual array of semantics then the second question what's a copy of what that has to come from some internal conspiracy about the nature of language I think there are answers but it's not trivial you should think I leave it as something to think through the I should point out in addition you can think about this so won't go into it that none of this works for the illegitimate operations so if you do side words merge or parallel merge and so on and none of this is going to work out there you do have the problems of the copy repetition and identification that's an independent argument to show that the illegitimate cases are illegitimate the main argument is they just yield illegitimate outputs but it also turns out that the concept of copy repetition and unambiguously finding copies fails for the illegitimate cases you can try it out and see how it works for yourself well notice that you can now rephrase the whole process of constructing merge and it's application you can rephrase it in terms of the normal way in which recursive processes are described I won't bother writing it out I'll just read it to you it's simple so suppose you want to define the set of integers how do you define the set of integers well what you say is the set of integers is the smallest set the least set containing one and containing the successor of any integer okay that's the set of integers that's transitive closure the friggin ancestral the classic way so what do we say here well what we say is that for a given I language the set of the set of work spaces is the set notice not the least set it's the set containing the lexicon and containing merge PQ work space for any PQ and work space that's already been generated okay it's the same as the definition the transitive closure of the set of integers with one exception we don't have to say the least set simpler here we don't have to say it because resource restriction already forces it to be the least said okay so essentially we we're in the right ballpark we have standard definition of the recursive recursive inductive definition under transitive closure of the set of work spaces and it's exactly where we want to be well there's a lot of I've kind of skipped a lot of steps here you know a lot of things that have to be filled in but I think maybe this is enough of a picture so you can see how they can be filled in well let me just to finish term for today I want to go to some new topics tomorrow more another set of topics but let's just ask what we can say about the things that are left unexplained okay so let's take say a TB across-the-board deletion what book did John read very simple case of a TV this is nicely handled in terms of most of dimensionality and parallel merge but unfortunately that's a legitimate so we have to ask is there a legitimate way of doing it well if you think about this the simplest approach is to say it just arrives by deletion from what and remember there's a and what did Mary read just a conjunction of these two so why doesn't it come from what the John by and what what book did John by and what book did Mary read well there's a standard objection to this the standard objection is nothing tells you it's the same book in both cases does that look like a problem however notice we already have a solution to that problem namely let's take a look at what the copies are this is a copy this is a copy this is a copy and this is a copy okay we have a principal independent of this that tells us that when we have a series of copies the first the top one the first one remains all the others delete okay so we automatically get the right form but what about the right interpretation how about the fact that the two books might be different well here the notion stability enters deletion in general has the property quite generally deletion in generally in general quite apart from this as in the topical ization case and anything ellipsis anything else has the property that you have to have absolute identity otherwise you can't delete okay so if you want to delete under say VP ellipses or topical ization or whatever it may be there has to be absolute identity that's the stability property so in fact it automatically turns out without any comment at all that you get a TB oh just follow the principles already established now there's something crucially important here namely you don't have C command between what between these two then it wouldn't work so if you have a sentence which I won't write it it's too much trouble but take the sentence which book did John asked which book did John ask which boy did John ask which boy John bill met okay which boy - John asked which boy bill met notice there you do have see command all the way through that means that at the phase level where you're getting the copy of which boy in the top case it's in theta position looking at a copy below it which it see commands that tells you it's a repetition by definition so at the phase level you know that these two things are repetitions not copies because there's a C command relation between them so crucially the lack of C command and coordination yields a TB but the same effort will fail because of phase theoretic considerations when you try to do it in a case like which book did which boy - John asked which boy bill met and so on you can kind of write it out and figure it out for yourself but the it all kind of turns out mechanically there's nothing to say about it pretty much seems true of parasitic caps you can try it on your own so the standard examples that are used for most a dominance and parallel merge fall fall into place without any comment they just come from following the simple processes of minimal computation resource restriction and stability the general principle of deletion well the task is show this for everything okay not a small task I'll leave it there and turn to some other problems tomorrow [Applause] does someone have a question now yes over Sara microphone over here just I'll show you where I professor Chomsky thanks so much for coming I think one question I have it's been running through my mind you know someone who is like seen the evolution of machine learning and technologies and the like and computers I kind of wanted to ask what your attitude your hope your forecast is for this like growing integration with machines and linguistics and how kind of I guess the integration or like the development of the machine as human as linguistics is I guess something that's inherently human and something that's characteristic to us what is your attitude towards that kind of that transformation of the Machine into the human below the diameter oh I'm sorry okay I thought you were already answering so the speaker the questioner is coming from the perspective of someone involved in machine learning and questions like that and he's interested in what you think the future prospects are for interaction between linguistics as a sort of a science of human language but applied to artificial intelligence or machine learning what kind of like what is to expected what is your I guess attitude towards that and kind of how we are starting to create I guess like artificial humans in a sense so that's what I want to say well yeah yesterday I think I mentioned a forthcoming book which in my opinion is addressed to this problem and I think gives a good potential answers book by Gary Marcus cognitive scientist who works on artificial intelligence and a colleague of his what they argue they the book is a very nice exposition it's coming out soon it's very lucid good exposition of the kinds of results that have been achieved in machine learning and it argues that they're heading for a dead end and tries to give an explanation of why it's dead end I think there are some deeper theorems coming along which may establish that more firmly it then suggests that to get around this dead end it's probably gonna be necessary for artificial intelligence to look more closely into the question of how human intelligence work in particular how children acquire their conceptual language and other cognitive structures and they suggest and plausibly I think that to overcome this gap instead of just looking at the power of machines you know the fact that you can compute fast and you got a huge amount of data they take a look at the actual mechanisms that are used by particularly children and language acquisition so if the studies of language acquisition which are pretty rich integrate with the inquiry into machine learning maybe that dead end which they point to can be overcome actually that goes back to a early question you know 5060 years ago about what artificial intelligence is and there were sort of two different paths that it could take that one path was to try to say let's try to discover what human until what intelligence really is using the capacities of computers computational theory and so on as means to answer this so let's treat it like a problem of physics for example we use the mathematical technological tools available to see if we can understand how the world works that's basically natural science the other approach is to say let's just try to simulate what happens that's very different from natural science and simulating what happens you can often do it but you're not understanding anything and the chances are that this departure from naturals artificial intelligence has largely taken the second direction there were people who were more interested in the first direction like Marvin Minsky for example but they became increasingly marginalized in the field even though he's the found one of the founders of the field but there are these two directions one is can we use the technological capacities of fast computers and the mathematical ideas of computational theory maybe of a neural network theory whatever that happens to be can we use those tools to discover as by the normal means of science to discover what intelligence really is for example how children in fact and make this incredible leap from very limited data to very rich knowledge can we do that the other task is can we just simulate what's happening the second task yields successful ly olds engineering successes like the Google Translator it's basically brute force but it's useful okay other useful things you know without understanding what they're doing or anything like that there is a kind of an idea Silicon Valley ideology which says that's all that's needed just let's get it to work and who cares about the rest if it works most of the times fine that's very different from the sciences so for example in say physics if you had a way of simulating the results of 99% of the experiments that take place in the world but you missed all the critical experiments and nobody would care what matters is the few critical experiments the ones that explain something you could match all the other experiments and that says you wasted your time the engineering approach takes the opposite view it says if we can match 99% of the experiments the critical experiments are usually quite exotic anyhow like you know what's happening at the base of the South African mine where you can maybe pick up a detection of a neutrino okay Garrett but that's not going to happen in the real world but if you want to understand it's the critical experiments that matter and as I suggested last time I think it's kind of useful to just take a look at the early history of science AI cognitive science linguistics today or pretty much in the same state as say physics in the 17th century just beginning to explore topics where there was kind of a lot of apparent knowledge but no real understanding and if you take a look at what happened then it's pretty interesting so the real breakthrough say with Galileo take a real example for 2,000 years it had been assumed that a heavy object falls more rapidly than a light object and there's plenty of phenomenal evidence to support that in fact just about everything you see so that view essentially Aristotelian view was pretty much taken for granted in up until Galileo Galileo carried out one experiment which disproved it furthermore he didn't carry out the experiment it was a thought experiment callaloo produced a thought experiment which shows it can't be true and that changed science radically the thought experiment was very simple he says take you read the dialogues you know take two cannonballs a big heavy cannonball small countable let's say that the big one falls faster than the small one okay let's now attach a chain between them okay well since the small one full falls less rapidly it's going to prevent the big one from falling fast so the big one won't fall as fast as it would have if it hadn't had a chain on the other hand now that you have a stop but a chain between them you have one mass just one thing so it's gonna fall even faster so you have a contradiction if you think one thing falls faster than another either it both falls faster and fall slower okay we've disproven 2,000 years of physics okay nobody carried out the experiment you can't carry out the experiment if you tried to carry it I'd probably wouldn't work but it's enough to finish off 2,000 years of the misunderstanding of a basic element of nature a lot of modern science was like that and in fact if you go right up to the present the same is true so one of the famous significant experiments and modern quantum physics is the shorteners cat story if the cat alive or dead you know you pass light through a slit look at it one way it's particles another way it's waves I don't think anyone's ever carried out the experiment probably hard to carry out but just thinking it through young that's right we've got this paradox we've got overcoming a lot of them a lot of science is just really thinking through what's got to happen under certain circumstances now there are cases where you have to actually do experiments to check it okay so yes you want to look at actual languages and see what are their properties but any serious experiment is going to be based on theoretical understanding in fact just for linguists among you compare traditional anthropological linguistics with corpus linguistics ok corpus linguistics you take a fixed set of objects each one of them is an experiment this is a legitimate sentence it's basically an experiment field methods anthropological linguistics didn't work like that when you took a course in fjord student back in the 1940s a the way I was what you were taught was how to ask the right questions every field worker knows this you have an informant you don't just collect every possible sentence they can produce you ask them you try to elicit information from the informant about things that are relevant to the nature of language so you want to find out is this language have parasitic gaps or how does this language handle a TB or some other thing that's science its theory guided experiments now going back to your point the direction and that's been taken in sort of Silicon Valley linguistics is corpus linguistics let's just take whatever data there is and see if we can match it yeah you get something but it's very likely it's going to hit a dead end and I think it's probably doing it on probably the right way is the way science is always preceded do theory guided experiments and try to in this case try to discover and we have point by now lots of interesting evidence about that how children in fact do acquire language what do they do what do they know at age 1 whether they know at age 2 how have they gotten there where was the data for what were the procedures they use and see out of that try to design the kinds of inquiry using the technology both the conceptual and physical technology available to see if we can carry that inquiry further I think that could be a constructive integration of the two disciplines okay I saw your hand first and then your you yeah hello so I have a question about your top my head is full of questions and the one I want to ask is for internal merge how does it work if after you build something the pieces of it are no longer accessible how can you do an internal merge when you you started out with when you do internal merge you know how can you do internal merge if the pieces of it are no longer accessible you mean if the two initial terms that you merge yeah yeah if they're no longer accessible well the you you first of all they are still accessible it's only if you've done internal merge the lower one is inaccessible the higher one is accessible okay and the reason the lower one is inaccessible is just minimal search if you're still higher up and you're looking for something to move you'll find the higher one first you won't ever find the lower one and that's the way we want it to be so suppose you're doing say success of cyclic WH movement you move a wh phrase to the you know the exterior of the lower phase the next step up you don't want to see the lower one anymore you just want to move the higher one okay so yes one is always accessible the one that's not blocked by either C command or by the phrase and P I say the phrase internal inaccessibility Commission condition which says anything that's already down low in the phase look at anymore that tells you those guys are gone but we now have a richer a concept that incorporates these but much else which is you have to restrict accessibility altogether to the minimum but it doesn't mean eliminated the things that were accessible may still be accessible but you're not adding any new ones except the one that you're forming course question down here it's the exact same question but I want to rephrase it so because I just still don't understand so you have P and Q in your workspace you merge them now you have PQ and no more P and Q how is it then possible to merge again P with P Q to the internal merge and how does that not fall victim to the same problems that Multi dominance or merge would have okay so he said suppose you start out with P and Q yeah oh I wished it so you merge you merge P and Q yeah and you get the you get the set P Q but nothing else he wants to know how you can then do internal merge if P is going it suppose you want to merge P with people that's thing we then how do you get into the new workplace is set P Q okay suppose we want to merge needed he is accessible this is he in here this P in here disappear but we just merged P E you we don't need the work left that is and we get a set P the workplace for now then that's gonna work well for in fact this we don't have because remember that the definition says anything that was in the workspace other than P and Q has to remain named that the minimum minimal condition okay so the new workspace will just be the set P the new object will be P and the set P Q if his rating he from P Q so if this was if this was ice or what and he is what the new thing will be what I thought what but nothing else there's I mean you could take this and make it as complicated as so in other words you're asking why can't you take this and turn it into something that's complicated as you want and that I'd be three yeah well if you try to do that step by step you see that you're adding accessibility every time you do it you'll be adding something accessible extra writer run through it yeah it's good question yeah I in question over here guy in the mic hi there professor so I just had a quick question and recei if I can now form this so in terms of in taking into account cultural impacts on a language as significant significant drivers in communication development how do you take that into account in terms of the development in terms of how children acquire language and I guess also tying that into universal grammar as a purely logical biological model how does inculturation from different communities factor in to language acquisition how does acculturation and influence of culture within which language is embedded because from an eeling from an external language point of view how does that does that play any role in the model of acquisition based on biological endowment and the other two factors that well in real language acquisition you know the actual process in a in a child many things are interacting and not just not just language so it's the nature of these social relations are affecting how a language is acquired what's your relation to your mother to your aunt to the kids in the street and so on all that'll have all sorts of effects on the way language acquisition proceeds but we're basically abstracting from that and saying well what would happen if there was no culture okay that then if you want to study the real situation you have to take this plus add lots of other things would this be analogous to making the idealization to a homogeneous speech community exactly same ideas very much like studying when you try to study a language view you take an informant and you try to abstract away from everything that's particularly former and you try to pretend that the speech community is a group of people who are basically identical of course they're not you know if you look closely they were all going to be different but that's the standard abstraction that you make if you study any collection of things you know and a good experiment tries to get rid of all of the extraneous things and make the objects you're looking at as identical as possible when you're carrying out an experiment with humans you can't interfere so you just have to pretend no oh boy okay I see a question at the back and there's also a question over here at the back I guess you could go for hi mr. Chomsky I got two short questions for you one how are you good to hear that he has two short questions the first one is how are you seem to be standing up that's a good answer - how was your work in other fields and other disciplines influenced your work with linguistics specifically thank you how does your work in other disciplines and other fields impact your linguistic work specifically a single human being is all kind of complicated interactions take place but a kind of a level of logical connections none personal connections yes sure all sorts of things question over here yeah hi I just wanted to actually jump back to the very first question that you had on sort of natural language processing an AI you spoke a little bit about how you feel like that sort of natural language processing is headed towards a dead end and I was wondering if you thought you mentioned that language is thought based or designed for thought and is a model after human thinking rather than a system so is wondering if sort of the impossibility of recreating language in a computer is due to the fact that I mean as we know computers can't think so then in what my line would we be able to if at all recreate the kind of language that humans have okay correct me if I'm getting anything wrong but she wants to return to a question about natural language processing and AI and the conjecture that maybe natural language processing or AI or approaching a dead end and she wondered if you could connect that issue to your claim that language is designed to meet internal considerations like reflecting thought or integrating with thought yes communication and if it is correct that computers don't really have thoughts and don't really think the hood that they donated goemon okay yeah but I think there's some loose connection the when you're just investigating the data that appear you know you're looking at say the Wall Street Journal corpus you're looking at things that were chosen for the purpose of communication okay but that doesn't that means they're a biased sample from the point of view of the study of language it's a subset of possible expressions that were selected for a particular reason to convince people to convince people that you shouldn't raise taxes on corporations okay so it's gonna be a very biased collection of data so of course it'll lead you in the wrong direction if you want to do the right experiments like say you're not going to get the answer to a TV from The Wall Street Journal corpus they don't care about sentences like that just like if you're studying motion and the 17th century you're not going to care about two cannonballs connected with a chain if you want to just duplicate things that are happening so yes the selection of materials in the standard natural language processing material the procedures the standard the collection of materials that are used the data are already heavy heavily biased towards the assumption which i think is a false assumption that language is structured so as to be efficient for computation the reason it's biased is you're picking the things that are exactly selected for that reason for example if you were if you had used sonnets let's say as your database you'd conclude that all linguistic expressions are fourteen lines long you know it's just a database that happens to bias you in the wrong direction and certainly with regard to your question I good question I think I gave the wrong answer it's actually face there it blocks it yeah that's exactly right so further questions yeah oh boy okay I'm gonna okay first Susan over here then Ethan and okay actually you can cut out the intermediary so in an effort to eliminate any unwanted postulating in an effort to eliminate any on motivating women's postulating any unmotivated moments for linearization purposes so let's say our effort we for our in our syntactic theory we want to avoid any unmotivated moments just for the sake of linearization right and could we possibly say for languages that we have more flexible word order that those are the results I think it doesn't help okay so in an effort to eliminate any unmotivated moments for the sake of linearization and going back to his topic point about how the relation previously could we say for languages that we see more flexible word order that those are actually a result of computational efficiency by affirmation of structure so it's like linear is like the fact that we have you know for sov languages we have multiple possible word orders is a byproduct of computational efficiency rather than those scrambling room is being made for the sake of arbitrary so ultimately it the question is a question about for example sov languages and the possibility that or the conjecture that sov languages have greater freedom of word order than verb initial languages and her question was is this do you think related to the idea of not allowing unnecessary movement or is it related to computational efficiency I'm not sure I'm getting your question right Oh is there a solid I would say anecdotally it does appear to be awesome things that verb final languages have some degree of a reader reader free but I don't know I just I don't really though it's an interesting if it's true it certainly asks for an explanation I don't see one anybody know anything about this yeah which latin yeah but I think Carlos's point is that some languages which have been claimed to have complete free word order really all don't really have complete and then the differences involve things like Yup'ik or focus or other types or in the case of Latin just a rich visibles case structure yeah for example but I don't know if it's generally true when you abstract away from things like that and and so I just don't know the I don't know what the facts are I think the gentleman in the black shirt had something pretty wouldn't that also be the same case in Russian as well there's more it appears that there is free word order in in Russian sentences although there is more like there's some context as to wins certain word orders are utilized in certain situations basically he said that the same kind of things have been attributed to Russian that is true some people traditionally iterating the word order but other people have argued that actually certain word orders are more appropriate to some situations than others and that in turn could be related to topic focus type well it was believed back in the 70s that there is a free word or parameter and the free word order was assumed to be the case for German Japanese lots of languages but which are at the base sov languages though you don't see it right away in German but I think that was pretty much shut down yeah I don't Torres I know you know maybe some of you know more I don't know any Rennick current generalization of this kind that's well substantiated Ethan you had a question microphone yeah I think we might as well for everybody else behind you okay my question is also about merge kind related Daniel's question if in the set PQ is accessible for Mars how come you can't merge the exponent with other and then did you so Ethan says if in the case of P Q where P is merged with Q if P is accessible for internal merge how do you prevent it from being accessible to external merge with something else in the workspace well though they're both parallel merge exactly or or sideways what stops it is you're adding more accessibility your if you if you do parallel merge the case that's essentially what you're describing you have P and Q say P and Q and or you take Q you merge it with or you've now added two new accessible things q or and the new Q and you're only allowed to add one accessible thing that's resource restriction so that blocks parallel merge and everything that follows thrown aside word merge is the same late merge is the same plus extra problems like the substitution operation okay there were people I think I saw someone else before I tell you but I'm not sure did someone else have there or was you okay okay why is first merge okay then between okay so her question has given your answer to Ethan's question how do you just do external merge between two things to begin with doesn't that have the same kind of it creates one new thing if you if you take P and Q and you merge them you get one new thing accessible thing the set P Q which is okay and in fact notice you reduced the workspace at the workspace originally had two elements now it only has one and three accessible things before it had two accessible things okay you have P and Q in the workspace it's a two-member workspace to accessible items you merge them you have the set PQ that's the entire workspace you only have one element on the workspace three accessible things P Q and the set P Q but that's what you want every operation is going to add at least one thing otherwise the operation didn't do anything but no more than that a question well yep from Tyler over here in the black shirt this assumed then that the entire lexicon is on the workspace to begin with it has to be I mean any computational system is going to have atoms and rules and you can always access the atoms like if it was proof there is a the atoms would be the axioms of the the rest as all the theorems you've produced but at any stage in the generation of a proof you can go back and use an axiom it's basically like that question at the back yeah so it's a correct to say that when you are making copies you won't expand any workspace so is it correct to say that when you are making copies you don't expand the workspace is that well takes a external merge it creates a copy but by the definition of merge you don't see the copy that it creates because of the resource restriction which happens to wipe out the copy in the case of internal merge and in the case of external merge you're actually reducing the work space in the case of internal merge you're leaving the work space with the same number of elements but one more accessible item and if the two copies notice that one of them is inaccessible because of C command so um giving your treatment of copies and how copies are deleted um does this mean that what is connect canonically treated as movement and what is canonically treated as ellipsis are essentially the same phenomenon and if it is why does that make sense given what you said about copies and deleting copies and what you've said about ellipsis and so on does this imply that movement and ellipses are in some sense the same thing well there there's a similarity but I think that fundamentally they're different the ellipsis that's involved in movement has to do with externalization now there is in fact if we try to take a look at normal ellipsis you know John read the book and I did too okay my belief is that ellipsis probably doesn't exist except at the pure phonetic level that ellipsis so probably when we understand it turn out to be just a reflex of DC accenting so notice there is a sentence John read the book and I read the book too with the accented read the book and that has to be generated and then one of the options once you've generated that is just delete 2d accent it now a lot of ellipses just falls out like that that's a suggestion that Howard Laz Nick and I made about a long time ago which back in the 80s I guess which was then expanded into a dissertation by Kristin kratie back around 1990 92 or so on dxn T and you get a very large part of ellipses that way not all of it and the question is what about the rest my guess is I think here's a real significant research topic to try to show that all of the rest of ellipses will probably fall under the same principle if we understand it properly now if you actually look at the literature on ellipses there's thousand conditions that are imposed on how to get it to work but my guess is that's a reflexive lack of understanding I mean it's the same kind of rule of thumb that I mentioned before if there's some principle a simple principle that works for the basic core of cases but there's some things hanging around on the outside that it doesn't work for chances are that the things on the outside we just don't understand I I think that's a major research topic I think we're done okay professor Chomsky I just flew in from Washington DC today and to be here with you glad to be with you I just flew in from Washington today to be with you said he just flew in from Washington today to be with you and hear your talk and this one isn't a technical question about linguistic this is a question about the 1950s during the 1950's when you publish syntactic structures there seem to be an idea that everybody was born like a blank piece of paper and that they learned language just from nothing and then you came up and said no no the poverty of the stimulus argument and that people are born with the genetic endowment etc etc and my question is how important is it to make people wrong or should you be scared it seems like existing narratives tend to want to protect themselves and if your revolutionary you're gonna take the hits and my question is is how tough was it for you in the 50s to come up with new ideas and house how battle-scarred did you become because of your new ideas thank you oh that's not the kind of new idea that you may be yelled at and condemned you're lucky BF Skinner didn't challenge you to a duel or maybe he's lucky he did okay thank you very much for today [Applause]

Loading