Phase III: Scouting
Last week Mike and I were statically analyzing python code to detect and graph all function calls, highlighting calls to the libraries we want to upgrade. We worked out how to wrap a function call in their code such that we can see the inputs and outputs while leaving the function itself alone. To do this we needed to write a launcher that runs their code and instantiates the wrapper at the same time. We figured out how to take those inputs and outputs and plug them into a function that writes unit test scripts. This put out a slew of nose unit tests, one test for every time the function got used during the run of their code. We verified that the unit tests can work with simple data structures.
Our goal this week was to better determine the scope of the job, dig deeper into the complexities of the data and the code we were dealing with, and start wrapping up some loose ends in readiness for a demonstration of the progress we’ve made so far.
After we got our static call graphs to a point where we were happy with them, we turned our attention to creating a matrix showing which modules called which libraries. The point of course is to create a target for the coverage we want to get with our regression test scripts. Every call directly to pandas or numpy is a call we want to exercise, at least once, and possibly hundreds of times if the data sent through the call is highly variable.
To do this we simply made a nested list of tuples in python, with the library call as the left value and a nested list of tuples on the right – each tuple in the nested list containing the client code file and the line number of the library call. E.g.:
[ clientlib.x : 150,
clientlib.x : 167,
clientlib.y : 38 ],
[clientlib.z : 312]
We then simply saved the nested list as a JSON object and serialized it (which is just a fancy way to say we wrote it to a file). We can easily use python to deserialize and instatiate the object from the JSON file, and then manipulate and display the data however we want.
I make an analogy
I think we can all agree that while the tip of the iceberg sure looks pretty, it’s the rest of the iceberg that will sink your ship. Static analysis will only take you so far. To get the bigger picture you need to see what really happens when you run the code.
Dynamic analysis with PyCallGraph
I made a promise to revisit pycallgraph when we had a working environment and could run the client’s code. I am a man of my word. We were able to start pycallgraph and run the code within the analyzer! What it showed us was… the rest of the iceberg. Far from the handful of calls into pandas that the static analysis had led us to expect, we saw over a million calls from over 130 client modules into nearly 100 pandas modules, and a few more million calls to several hundred pandas modules originating within the pandas library. Reality sets in, if it hadn’t already – we have a lot of work ahead.
Making readable output
I really wanted to abstract away the internal pandas calls, so I decided to get pycallgraph to save the DOT file that represented the graph. I couldn’t figure out how to do it by reading the documentation, so I shot an email to Gerald Kaszuba, the creator of pycallgraph. He very kindly answered my question by pointing me to the output.generate() function of the code. In my python script to run the client code within pycallgraph I added a call to generate() the DOT code and saved it to a file.
Then I wrote my first serious piece of python, nearly from scratch, nearly 200 lines, that processes that DOT file. It figures out which calls are made by pandas modules, and removes them from the DOT file. Then it makes a list of all the pandas modules that I’m certain I want to keep, and removes any node definitions that aren’t in that list. That cleans up the graph a lot.
While I was in the DOT code I noticed that pycallgraph appends a number to each node representing the total number of times it was called. It also appends a number to each edge representing the number of times the tail module called the head module. This is perfect for heat mapping, so I updated my processing script to find these bits of data and print them out to a couple of CSV files. I can import them to Excel and create pretty charts showing which libraries ought to take priority as we continue.
Unit testing with pickle
Enough about me. Mike was a superhero this week. He figured out how to use a python tool called pickle to serialize and deserialize complex inputs and return values which we can then feed to our unit test scripts. Before he did all that work our AOP wrappers worked but only with simple data structures like text, integers, and lists. Now we can save entire classes as a pickle file, instantiate them during the unit test run, and use them as input to the pandas function we are testing. If the pandas function returns a class object, we saved that as a pickle during the initial run a well, so we just instantiate it and compare the object that got returned with the instantiated pickle object.
It is so slick and nice that I’m thinking of stealing it. I have to create a (much smaller) list of unit tests that drive the production code rather than the target libraries, and I think that I’ll use pickle to set up my expected results.
One other product of Mike’s work this week was tracebacks, from the same AOP wrappers that are driving the unit test creation. He can see the call stack all the way up from the pandas or numpy call he wrapped. We decided to look at all the files mentioned in the stack trace to see if we could follow the logic of a call.
Our investigation revealed that a few files were invoked during the run in which no code was run. They simply imported the function, and were called in turn by the next file. We took a closer look at these files. Why would someone import a function but never use it?
The code in these files had comments in them along the lines of “if the pandas code that is being used is version X, then we replace its functionality with some code in here so that our production system won’t break.” The pandas code we were running didn’t conform to the requirement so no code was run. But it could have, and the production code had all been transformed to import these functions from these ghost wrappers around older versions of pandas. Someone had been here before, fixing the issues with breaking production code after upgrading these libraries.
We felt a welcome moment of recognition. Here was a fellow traveler. We have his name. Maybe we will be in touch.
I will definitely be in touch with you. See you next week.