Phase I: The Mission
My partner Mike is the experienced coder, and I’m the experienced tester and analyst. I also have good knowledge of dot (part of the graphviz family of visualization tools). But what I don’t have is experience writing unit test code, or analyzing source code to divine possible behaviors. I’m going to be learning a ton in the next few weeks and want to share what I learn as I go.
Mike and I have an amazing support team that continually feeds us useful ideas to keep on track and move forward. The first head-start we got was a proof-of-concept program that showed us how to do two things we need to do.
First, it gave us a way to walk the dependency tree for a piece of python code. It did this in a clever way, by wrapping each call in a reporter method which put out the source of each method and all the arguments.
Second it showed us how to use Aspect-Oriented Programming (AOP) to intercept calls to a particular library, in this case pandas, and run code to report all the data sent to and returned from the called method. We will not have to insert code into the library nor into the production code to monitor the traffic to and from the library. This is how we are going to test the coverage we get on the library and confirm proper functioning.
Scouting out and visualizing dependencies
This week we have focused on getting a good handle on the dependencies. We have tried three different methods, and have homed in on one that we think will give us the best bang for our buck.
Textual AST Walker
The first method was to use the Abstract Syntax Tree (AST) property of python to access an object describing the structure of the code (using the getattr method on the name of the class), and walk that array node by node, reporting each node’s data (methods, args, data, etc.) to show the provenance of each piece of executed code. The script we used puts out a text file containing a ton of data about the python code that gets analyzed.
We decided that we wanted a visual representation instead, one that would give us a look at the dependencies hierarchically. So our support team recommended looking into a tool called pycallgraph, which puts out dot code and PNG files.
The next method we tried was to use pycallgraph. Unfortunately it is a dynamic analysis tool, which means running the code in place and analyzing it as it runs. This has the enormous advantage of having unfettered access to all the methods in use, since they are currently running, and so can produce a much richer data set. The enormous disadvantage is that it requires running code, and without a tightly controlled test environment we cannot use it. Since we are not fully operational here this week, we have to put pycallgraph away. If you are a fan, though, never fear, we will revisit it once we are up and running.
Our intrepid support team next turned us onto a little project someone wrote called construct_call_graph. This is a static analyzer (yay!) and it puts out dot code for visualization (double yay!). We are working with it to extend its capabilities to show all modules that inherit anything from pandas, and highlight them in the graph. It also uses the getattr method to figure out what things are called and where they are from.
Mike is doing great work learning how to walk through the AST tree for all the dependency data. I’m teaching him how to format it nicely in a graph. And we are both discovering the joys of making python write and read files.
While Mike works on that I’m reading up on nose, the python unit test tool that they use here. Our benevolent overlords on the project picked a representative module for us to start working on, in developing the dependency graph, learning the domain, and discovering how the code and the tests are constructed.
For now I’m reading up on nose, and looking over the source code for the first module we are testing as well as the tests included in the same source directory. When I figure out how it fits together I’ll let you know. Stay tuned!