0:01 Hi, my name is Steve O'Hara and I'm the founder of Eagle Legacy Modernization LLC.
0:11 This video is part two of three. Part one talked about parsing computer programming languages. Part two talks
0:19 about dynamic analysis and interpretation, which is what we're going to do today. And part three covers
0:27 transformation and generation. all right. So if you have the desire to
0:35 do dynamic analysis, it's very challenging because you really don't
0:42 have access to a lot of the internals of compilers and interpreters to be able
0:49 to inject meaningful metrics into running programs.
0:55 And we made a decision that in order to really effectively collect dynamic analysis information about a running
1:03 program, we really needed to have our own interpreters. Now, interpreters are significantly easier than compilers or
1:11 transformation systems because you do have access to runtime information. they're
1:21 very much slower and not as I
1:28 don't know e efficient as the build the regular compilers but nonetheless
1:35 they can be very useful for collecting information on on how a program runs what what values does it see and you
1:43 know what what exceptions happen and you can collect a lot of runtime information. So, we decided to demonstrate this by actually creating
1:51 interpreters for at this point 29 different languages and counting that uh
1:59 don't try or don't claim to be able to cover all the features of all the languages.
2:08 Rather, our focus has been on covering the fundamentals of languages. In other words, it's better to build a base of
2:18 solid analysis tools rather than really focusing in on one particular
2:25 pair of languages or one language I suppose like cobalt which seems to be the main one that the industry deals
2:34 with for good reason. So we're trying to build a collection of tools that allow
2:42 the development in the future of these full language interpreters and transformation systems
2:49 rather than trying to do all of them all at once which is like impossible.
2:54 Anyways, we have three programs that we're dealing with in in this section.
2:59 we have a Roman numeral system. It converts numbers to Roman numerals and
3:05 back and does a lot of self- tests just to make sure that everything is working perfectly.
3:14 This little program is written equival I well we wrote it equivalently in 29 different languages
3:23 that try to span pretty much the world of programming languages. In other words, we don't have every variation of every language included, but we have
3:32 all the main languages everything from Scala to Rust to Python to Cobalt to for and so on.
3:42 The second program that we have is essentially an expression tester looking at the
3:50 things like order of precedence of operators, the behavior of specific operators and so on.
3:58 And the third one is the same for different kinds of statements like if statements and while statements and
4:04 switch statements and so forth. So together these three small programs are
4:12 written in all of the 29 languages and they're all provide a basis for experimenting with these. They do not
4:19 try to explore all the features in all the languages just the fundamentals the the basics that are used in in every u
4:28 application that's written in those languages. So here's the current
4:34 state of them. there there's four but not for any really good reason. this
4:42 is the first one is the expressions tester in this case for deli which used to be called Pascal.
4:48 ignore his buzz sorry. Roman numeral program as well as the
4:55 statements program. So each of these three expressions Roman numeral and
5:02 statements are written in each of those languages. And for no good reason, we're going to be kind of focusing on Julia for several of these today.
5:13 And for example, here is the Julia version of the Roman numeral converter.
5:21 It has four functions in it. A converter to Roman numerals and a tester for that.
5:28 A converter from Roman numerals and a tester for that. as well as a small main
5:34 test program. So this logic is fairly similarly implemented across all of
5:43 the 29 languages.
5:48 the expressions tester we're going to show you that in more detail here in just a moment and
5:57 the statements likewise show you more details on that in just a moment. Okay.
6:04 All right. So, one of the things I want to quickly cover with you is that the languages are kind of funny that the
6:15 behavior of languages is not always obvious. One of my favorite examples of that is the is the percent sign for
6:23 calculating the remainder of division or modulus.
6:27 And it it turns out that the there's a difference between remainder and
6:34 modulus. And in math there isn't really much of a difference, but it is semi-significant if you start dealing
6:42 with negative numbers. Okay, so I've picked out a couple of examples
6:48 here. This first one is in C and I'm using a a public tool or I guess
6:55 a I suppose there's a commercial version of as well of tio.run.
7:00 And tio.run run is sort of a playground for different and they have hundreds of different programming languages that
7:08 they support to do you can run little programs. So here I've written a tiny essentially fourline
7:16 program to do the percent operator in C
7:23 using this tio.run website.
7:30 So let's run it. See what it does.
7:34 So if you run this thing, you can see that five when divided by three gives a remainder of two.
7:41 But if you have minus 5 remainder divide when divided by three, you get minus two
7:47 and so on. So 2 - 2 2 and minus two. Now I want to show you the exact same thing in Python.
7:57 Looks pretty similar, right? five and three minus five and so on. You would think that between C and Python that
8:05 they would get the same results. If you'll recall, this one gave results of 2 - 2 2 and minus2.
8:14 But in Python, we get something different. We get 2 1 -1 and minus2.
8:23 Okay, so this is the true modulus function whereas this is the true remainder function.
8:33 They are slightly different and that came as a bit of a surprise to me.
8:39 Anyways, and there's a third set of them. alol 68 and PL1 and some others
8:47 have yet another value for the exact same modulus function.
8:54 It returns 2 1 2 and one. So th this kind of exploration is useful and
9:01 important when you start dealing with many different languages at the same time. Okay. So modulus is a funny little
9:11 beast, but it's not just modulus that's funny. Suppose you have a string with, I don't know, 10 characters and
9:18 you ask for a substring starting at line or starting at character position five and ask for, I don't know, 100,000
9:27 characters, what happens? Does it crash and burn? Does it just give you to the
9:33 end of the string? The behavior is unknown without researching it in the
9:42 online documentation, but the best way to find out is to actually write a little program to test it and see what it does. So that's what
9:50 we're doing here. return values from a function are odd. Languages like
9:58 Python and C, Java, yeah, there's an explicit return keyword. Languages like
10:05 lisp and hasll simply return the last value inside of a
10:12 function. You don't have to use the return keyword. You can in some cases but generally not. And like deli is also
10:22 different. deli you assign the return value to the name of the function as if it were a variable. and that that this
10:31 is the kind of discrimination and determination that we're that we're dealing with trying trying to understand the behavior across different languages.
10:42 Okay.
10:45 So, what we're going to do here is show you the how how these things work. How
10:54 how do we do dynamic analysis? How do we collect this information?
10:59 So here for example is the Julia programming language and this is the while statement in Julia and this in
11:07 fact is how it's parsed and if you want more information on that you have to go to the first video this is how it's done
11:14 and then we have a collection of different types of metrics that we can collect. So we have a the
11:24 generic concept of a for loop and a for loop talks about things like how many times was it executed, how many times
11:33 was there a break statement inside of it, how many times was there a continue statement inside of it. So we can
11:39 collect various metrics on for loops in a language independent manner. Okay.
11:48 And then here is the logic for interpreting or running the while statement in Julia. It's kept
11:57 in the same location as the parse information. What we call a program grammar or programmer.
12:06 and here's this is the actual actual code. This is what's actually being run. And a fair amount of this is for metrics.
12:16 Okay. So, we're going to collect some metrics. We're going to perform some actions. So, we're going to have a result. And the result keeps track of
12:24 whether it was a break or a continue and that kind of thing. Okay.
12:31 So, this is this is it. This is the entire logic for running that Julia's while statement. And it's very similar
12:39 across languages. Other languages have a very similar structure to it. We don't have to know in in this logic what kind
12:48 of a statement we're performing inside the while loop or block of statements.
12:54 It's all handled by this recursive processing of statements. Okay. So this
13:02 is the one that this little line here this subtle little line is the one that's actually running these statements
13:08 inside the while loop. And then we check for a break result. If was there a break found there or was there a
13:16 continue found there? There's different behaviors, but you can see that we're tracking these actions and we can count them. Okay.
13:26 Then we say completed the loop. And what's interesting here, let me bring
13:32 that up for you as well, is that we can see
13:44 the XML file.
13:49 Me bring that up for Julia.
14:00 Julia Roman
14:04 here. Okay. So, here running that
14:11 Roman numeral converter. There are while statements in there and conditionals in there. And here is a XML file. Now,
14:19 XML files are really more for computers than humans. So, it's a little bit difficult to read. made a little bit
14:27 bigger but you can see that there are things like calls to the various functions and you can track where it's called from it was only called one time from line 95.
14:39 you can see the argument types.
14:44 This very specific gives you specifically gives you the patterns of the calls that were used. You can see variables that were assigned the scopes.
14:56 this is a variable called expected.
14:59 It was assigned four times and all four times it was assigned a string value. here is an assignment to an integer.
15:09 It was assigned 4,03 times. The average value. The minimum was one. The maximum
15:15 was 4,000 for that assignment of this variable to num and so on.
15:21 So all these statistics or metrics are collected in a language in in a into a language neutral form and
15:31 will be used for transformation which we'll get to in volume three. Okay.
15:39 Now I want to show you these expressions and and so on running
15:48 live. Okay. So here is the
15:56 Julia expression tester and in fact it has I don't know a little more than a
16:03 screen's worth of tests. It has some boolean type tests. It has some numeric type tests including some double floats.
16:14 a couple string or maybe just one string test. It's it's doing a collection of tests, but this is all
16:22 running in JUnit. And what it's doing is it's actually generating
16:28 another Java file. Okay. So, this is a generated file and it's generated by the
16:36 the this program here. So, when I run this program and let me just go ahead and do that.
16:46 It's going to perform all those tests.
16:50 Oh, this is the Roman numeral one. My apologies for that. Let's run this one.
17:00 Okay, so this goes out and it runs each of those tests separately and independently. There's an inverse test to make sure that the fail messages come
17:09 out good. So, it pretends pretends that it works and then it pretends to fail. That's why there's an
17:16 inverse for each of these. So each one of these tests is actually performed twice.
17:22 So just simple checks u very simple for expressions. So trying to
17:29 experiment with those. The statements is a similar kind of a thing. it generates let me run it.
17:45 it generates two versions of programs of each one of these little pro or programlets I suppose. so this is
17:56 generated by by what we just saw and then run.
18:01 Okay. And each one of these is doing something fairly simple. This is testing if then else's or if else's. there's
18:09 another one that's testing if I keep going testing loops. How are loops perform?
18:17 Here's a while test. Here's more while test tests and so on. Here's a for
18:25 loop testing. How are for loops? What's the behavior?
18:29 how do you do a a reverse loop? For example, if you want to go from 10 down to one, what's the what's the syntax of
18:37 how that works? And that's what this is all testing. And then this is the Julia version that we already saw for the Roman numeral converter.
18:49 And one of the things we do frequently frequently is let me grab all this, copy it,
18:59 and let me go over to
19:05 here. And again this is tio.run which is not ours. It's a commercial or
19:13 public product. And let's paste that Roman numeral program. And we run it.
19:20 So we're essentially running this against the production or a production Julia compiler.
19:27 And the output is absolutely completely identical across all 29 languages
19:34 deliberately. this this output is is the same for the forrand version, the
19:42 cobalt version, the deli version, the lisp version, the assembler version and so on. identical output and it's it's
19:51 matched and they're all run against a production compiler system. So we have our interpreter
19:59 and we have the the real production version of these things and we do a sidebyside comparison well automatically
20:06 of course comparison between the two to make sure that all the behaviors are completely correct. Same thing for all
20:14 the expressions and statements we run those against a production system like tio.run
20:21 to make sure that all of them are perfect as well.
20:26 There's one last little thing I'd like to show you in this block and that is that we actually have if you'll recall
20:33 from part one the parser there was a parser debugger.
20:38 We have also a interpreter runner debugger that does a similar kind of a
20:45 thing. Again it's fairly primitive. Lots of I don't know opportunity for
20:53 enhancement I guess. and what we can do is we can actually run this thing and step through it
21:01 whichever way we want and you can see the action as it's going. So it's a it is a debugger. It's not very not very
21:11 robust. You can set a break point in the source someplace and tell it to run until that point and
21:20 it'll get you there and then you can keep going from that point and you can see both the list of functions
21:28 u as as well as the variables as they're assigning their values and you can see u the behavior.
21:39 So it's a nice nice little tool. So what we have or talked about just now is
21:46 creating interpreters for 29 different languages relatively
21:53 basic in the sense of covering the fundamentals of the language. Not trying to cover every detail not trying to build the full thing. It's very huge
22:02 project but trying to make sure that all the basics are handled that all the expressions are correct the statements are correct functions or methods are
22:11 callable and and so on and then comparing them with production systems to make sure to prove that the behaviors
22:19 that we're interpreting are identical to the actual production systems. The next video in the series will talk about
22:27 transformation and generation. So transforming from these 29 or whatever languages forward
22:35 to three or four output languages. Thank you.