0:01 Hi, my name is Steve O'Hara and I'm the founder of Eagle Legacy Modernization 0:07 LLC. This is part three of three. Part one dealt with parsing. Part two dealt 0:14 with dynamic analysis and interpretation and this final part deals with transformation and generation. again, 0:23 all of programming languages. hopefully you've seen the other two videos or understand the content of 0:30 them. because it is very much a relevant part of this if not might be a bad might not be a bad idea to go back 0:38 and look at them real quickly. So first I want to talk about transformation because transformation is a really hard problem. Many companies many 0:46 organizations have been trying to do this for decades. There have been some fairly spectacular failures where 0:54 hundreds of millions of dollars have been spent with little to nothing to show for it. some examples are the 1:01 California Department of Motor Vehicles, the Internal Revenue Service, and there are many more over the decades that have 1:08 invested a ton of money and come out empty-handed. Why? Because it's a hard problem. And one of the reasons why 1:17 it's so hard is that it has to be precise. When we look at all the LLM AI 1:23 kind of stuff going on today, it's all essentially a non-deterministic approximation. In other words, if you do 1:31 the same thing twice, ask an LL LLM something, chat GPT, whatever, you may get different answers on each time 1:39 you ask this identical query. And that works fine for humans. Humans are just fine with an approximate answer. we can 1:47 work with that and and tune it to suit our needs. But for programming language transformation, you don't want your bank 1:54 to be approximating anything. You want it to be as precise as possible. And so there is room for AI. There is room 2:02 for LLMs here. This is I'm not excluding them, but it it very clearly to me 2:09 precision is mandatory. This is not a an opportunity for an approximate transformation of your code from one 2:16 language to another. It's an opportunity for you to do it precisely and verifiably and predictably and 2:23 deterministically and so on. Okay. another part of transformation that's difficult is that a lot of the commercial tools and there's still a few 2:31 out there tend to produce what we pjoratively call job. In other words, converting cobalt to Java and it's in 2:39 Java. Yes. but it still looks like cobalt. So if you still see picture clauses and performs and other relics of 2:48 cobalt in your Java code, well that's not really a good thing. that that's really rehosting 2:55 more than it is transformation. So to do transformation right, you really have to have both static analysis and dynamic 3:04 analysis and good test suites and a whole bunch of other things. It's a very complicated task and what our 3:11 organization is doing at Eagle Legacy is we are building essentially the basis for the foundation for doing 3:20 transformation well what we consider to be the right way. and we've got a lot of tools to show a lot of bits and 3:28 pieces and they're we'll demo some more in in this this little video here. We don't want to end up with Joe 3:37 Ball. We want to take cobalt and we want to end up with something that you shouldn't really be able to tell that it came from cobalt. It should really look 3:45 like Java. Maybe not as nice as if an expert Java programmer had written it, but reasonably close. we also 3:54 don't want to have a lot of runtime libraries. A lot of them essentially have a runtime library call for each 4:00 cobalt statement and we really don't want that. We really want to produce reasonably clean correct 4:10 Java code that does not show its history of coming from cobalt in the 4:17 first place. There are a lot of little challenges along the way too. Those are some of the big ones. little challenges are things like parenthesis 4:25 adding parentheses and a lot of situations like if you have subscripts and things one language has a starts the 4:33 first one is one the other language starts the first one is zero you end up with a lot of things like plus one and minus one and sometimes plus one minus one which isn't really much fun either. 4:43 so there's a lot of small issues along the way as well. we haven't really spent too much time fixing those 4:51 small issues. we've been really focusing on the big ones. but they are they are an issue as well. All 4:58 right. So I want to show you I want to start by showing you the the code. So we're doing two things here. We're doing 5:06 transformation and we're doing generation. transformation takes you from at the moment any one of 25 5:13 languages and into any one of three languages. In other words, we only support three targets for 5:20 transformation. C, Java, and Python. Uh, Rust is progressing and is well on 5:27 its way, but I would say we're at maybe three and a half languages. Rust is 5:34 quirky. It is I've never had to cast an integer constant or a string literal before. Uh, so there's some annoying 5:43 things in Rust and it's because I'm not an expert in Rust. I'm I'm absolutely still learning it. 5:50 but we have 25 source languages uh, and we have three target languages. 5:56 So there's 75 different transformations happening here simultaneously within the same framework. Okay? And I want to show 6:03 you how that works. Okay? So, we're going to start by looking just at one language, Cobalt. And we're going to just look at one verb in Cobalt, the 6:12 compute statement, which is essentially an assignment. And we're only going to look at generating Python. But keep in 6:18 mind that Cobalt is one of 25 and Python is one of three, but they all look very, very, very similar. 6:28 All right. So, what you see on your screen now is is the cobalt compute statement. This is inside of Eclipse. 6:35 Okay. And in here, oops. What we're looking at is the the at the top part, 6:43 this few these few lines here, this is the parse information. Okay. Then we have the interpreter, which is tiny. 6:52 This is all it takes in the in our in our little world to interpret the cobalt compute statements. You pick up a value 7:00 and you assign it. And then here is the transformation. Now notice that the transformation doesn't have any knowledge about what the target language 7:08 is. It doesn't matter whether you're going to C or Java or Python or whatever. It's generating this sort of 7:14 abstract target language this generator. So this Eagle generator is an abstract class. There's there's three 7:24 well four if you count rust instances of it and it handles all the language specific kinds of things. Okay. So 7:32 there's a thing called a new expression statement which allows you to do an assignment and you'll you can kind of 7:40 see how that that's built up here. We create an assignment statement and then we store it. I mean that's it. So 7:48 this is transformation. transformation takes the source code and it 7:55 generates this abstraction layer. Then there are three there's a Python 8:03 assignment and there's a C# assignment, a Java assignment and actually a Rust assignment. And here's the generator 8:10 code. So this is the code that generates Python. There's equivalent code in C. 8:16 There's equivalent code in Java. And what they're they take the same parameters, okay, and they produce a 8:23 statement in their language. Now, one of the things I want to show you in here that's yes, there's at least one 8:32 example of it here is that there's annotations for controlling the spacing. 8:38 So, there are things like blank lines and indentation and spacing are all controlled through various 8:46 annotations. So you can make the output look pretty. So if you want the curly braces on the same line or on a different line etc. All that can be 8:55 handled through these the set of annotations. Okay. 9:02 All right. So this is this is this code right here that you're looking at now. 9:06 This transform this is pi going from Python. Okay. If you're transforming from Python to whatever. 9:14 And this generate is if you're transforming from whatever to Python. So the generator is only in a few of these 9:21 files. So in there are essentially three or four blocks for every element in the 9:29 language. There's the parsing part which is normally this declar declarative. There's an interpret, 9:38 there's a transform, and sometimes if it's if it's one of our target languages, there's a generate. 9:46 Okay. 9:47 All right. I want to show you here. Now, this website is available, 9:55 eagle legacy.com. You can go to it right now. And we're on the the actual website. And the samples is where 10:03 we're going to spend the rest of this little video. We're going to be going inside here. And this is again dealing with the same three programs we had 10:11 before. The Roman numeral trans translator from numbers to Roman numerals and 10:18 back the expression tester as well as the statement tester. We're going to focus on the Roman numeral thing. And we're going to kind of pick a language 10:26 here randomly. And well, let's stick to cobalt. Okay. the some of these don't have a little T on them. The little T 10:35 means transformation. So DOSS, Bash, Lisp, and [groaning] Assembler currently aren't being 10:43 transformed to C, Java, and Python. I didn't really think it made a whole lot of sense to convert Bash to C or 10:52 something. I don't know. It could be done. Maybe it will someday, but it it didn't it didn't seem to be in the same 11:00 vein as all the the rest of them. But anyways, let's look at cobalt in more detail in terms of transformation. 11:06 And again, remember this this little block shows our identical output from each of these in their raw form as well 11:13 as after transformation. So when we transform cobalt to C, the C program better produce the same output. All 11:23 right. So, here is this is on the website here. Uh, here's the original Cobalt program for the Roman numeral. 11:33 That is a little hard to read. See if I can make it a little bigger. 11:38 So it's it's using the perform verb. Uh, 11:44 and the perform verb in its older form didn't have parameters. there's another version of this cobalt program 11:52 that I have that does linkages where you can actually pass parameters in but I figured that that well this is the way I 12:01 wrote cobalt back in the 70s. Okay, I'm showing my age. you pass param you set you assign the parameters to 12:10 variables. You do the perform everything in cobalt is a global variable. There's no concept of local variables. So you move things over to global 12:18 variables, do the perform, and it sets perform. It sets variables on its way out. So this is just the the the old way 12:26 of doing things. So I wanted to put that in there. Here's the compute statement we were just looking at, which is essentially an assignment. You notice 12:35 they don't have a remainder function. Anyways, let's move on 12:42 and we can skip those. Here is a let me shrink it back down a little bit. 12:50 here is for each one of the languages the the key elements are expressions and statements. Okay. And the expressions 12:57 are computational kinds of things and here are the basics in cobalt. 13:05 So for example here's multiplication, here's addition. 13:08 they have some logical testing. is something numeric ands and ors and is of type and and so on. Okay. And likewise 13:17 with statements there's quite a lot of statements we have implemented in cobalt. We cobalt's one of the more mature languages we're dealing with and 13:27 we're going to be working on this compute statement which consists of a modifiable identifier with an optional subscript 13:35 say rounded equals some expression. This is the general form in the grammar. Not necessarily all of those pieces are 13:42 being transformed. And if they aren't they can be so and now I want to show 13:50 you this part. Let me again make it a little bit bigger. So v video one focused on the parser. 13:59 And in the parser we could have things like here's the the program 14:06 programmer semantic tree. This is the essentially the a I call it a PST. It's 14:12 a tree structure of the contents of the results of parsing. And here's a 14:19 trace which is an absolutely enormous file. It's got 58,000 lines in it. And you can keep expanding this and see more 14:28 and more. This is searching the the tree of trees. So it's it's searching for a parse solution. And this shows you all 14:36 the attempts that it's making as it's going through trying to parse the file. 14:44 and here is a dump. This is equivalent to the XML file you just saw, but it's just a slightly cleaner, more 14:51 human readable form. Okay, so that was in video number one. Video number two was the interpreter and you can see 15:00 that this is the actual output from our program. And here are the metrics. We showed you this in 15:09 video number two, the metrics associated with that program. And here's the actual output from a real Cobalt compiler. 15:20 Okay? Okay. And that's what everything is compared against to make sure that the results match exactly. as in diff 15:27 shows no f no differences. Okay. So I want to focus on this last section. So that cobalt program that you were just 15:36 looking at is converted to C, Java, and Python. So what do they look like? Okay. 15:41 Well, here's what they look like. for example, everything in Cobalt is fixed size. So there's no concept of a 15:49 variable. Well, there is kind of, but they're not often. Of strings, these still have all their 15:58 spaces in them. Okay, so there are some slight differences. All the variables in Cobalt are global. 16:06 So, here's all the variables. 16:09 But you'll notice that there's no runtime libraries. There's no weird no weird picture clauses in here, nothing like that. 16:19 and it's converted to a fairly reasonable looking C program. 16:29 Okay. 16:30 And we have a similar kind of thing with Java. Very similar really because C# came from JDK11. So they ought to be similar. And here's the Python. Okay. 16:43 And Python demands that we use globals to access things, right? makes things a little challenging, but we got it all working. 16:54 okay, so those are the three output files. And we also have this thing called the mapper. And the mapper kind 17:04 of shows you sort of a simultaneous comparison between the original on 17:12 the left cobalt and the transformed uh C on the right. And it's not 17:20 perfect because things roll off the screen. but the line numbers are accurate. So this this line here on the 17:28 left is this perform test to Roman maps to line 47 which is up above 17:35 here. Here's a call to test to Roman. So there's a mapping back and forth between these two. and it's it's kind of a 17:43 cute display but it doesn't work very well when things change a lot in line numbers. So it's not I don't know 17:52 maybe a scroll bar or something would help it. Okay. But this is a mapper, a sidebyside mapper to show you how things mapped from one to the other. It is not 18:00 a one one but where there is a one one it shows you and highlights highlights those changes for you. Okay. 18:12 Now I want to do something here. I want to take the 18:20 So, here is the generated Python that we're just looking at. 18:26 Remember it had those globals in it and so on. I want to copy this thing and I want to take it to our old friend 18:34 tio.run and I want to Yeah, we're in Python 3. 18:41 So if I paste it in here, now what you're looking at, remember this is Python, but it's it was not written as Python. You're looking at a Cobalt 18:49 program that has been transformed off completely automatically. No human intervention to this Python. And what 18:58 we're going to do is we're going to run this thing and cross our fingers. Yay. 19:03 And you can see that we got the exact same result. Okay. There's an awful lot of I'm a big JUnit fan. huge 19:13 suites of tests to go through all these things and you know f first to run the original cobalt and compare the 19:21 results transform it to C compare the results transform it to Java compare the results transform to Python compare the 19:28 results and so on. just to make sure that everything really really works and all of this is completely completely 19:36 automated at the moment. is it always going to be automatable? I don't think so. I think every transformation really 19:44 should have some some human human intervention, some human involvement in it, you're going to get a much better 19:51 result. expecting a meat grinder type machine to just pour in cobalt and get out sparkling beautiful Java or Cobalt 19:59 or Java or C just isn't practical. Now there's one big thing I want to say 20:06 here in sort of in conclusion that almost all of this is is going to be open sourced soon. The only part that's 20:16 not being open sourced is the parser itself. So just the little parser in the middle 20:22 that has a patent associated with it. and that is what I would consider intellectual property that belongs to 20:31 Eagle Legacy Modernization. You can run these things. Okay. If you go to the 20:38 this website that I've been showing you and you go back here, there's a try the parser and you can you can try the 20:46 parser. You can run it. Don't don't as you as you can see there please don't use public files but all this work on 20:53 the grammarss the in interpretation transformation generation pitg I call 21:01 it parsing interpretation transformation and generation all of that stuff other than the parser itself 21:09 are going to be public domain because it's far too big for one little 21:17 company to to do all of these languages all at once. There's there's just an awful lot of stuff here and it's got to 21:26 take some community in involvement to make that happen. So anyways, I hope you appreciate this stuff. a lot of work 21:34 has gone into this. I think there's some pretty incredible results to have somewhere between 75 and 100 uh 21:43 automatic transformations happening for several different programs. That does that's just directions. I think is 21:50 really cool. and the probably the the two biggest things I'd like to say about this thing is number number one is 21:59 that the code is modular. that dealing with you you deal with small 22:07 pieces not big pieces. so that you you don't you don't have to think about well am I going to break the if 22:15 statement if I change the while statement. I don't have to worry about am I going to break C if I touch Java. 22:22 You don't have to worry about things like that. And second is this whole concept of things staying in sync. the analysis tools and the and the grammar 22:31 tool writers often are in different rooms or buildings and the changes in 22:37 one can can really impact the other and historically because BNF files are 22:45 text files essentially it happens blindly and there's no there's no automated well maybe there are techniques but there's no built-in 22:54 automatic way of determining that hey wait a minute they change the grammar on me, my my expath expression is going to fail. So anyways, those are the two big 23:03 things that really distinguish this from other u similar efforts. And hope hope you enjoyed this. Sorry I went so 23:11 long. And thank you.