Phase II - Preparation
When last we saw our intrepid new SDET I was learning python, learning the unit test framework nose, and helping my partner Mike determine the best way to graph the call tree for the production code we are going to be working on. Our mission as you recall is to enable an upgrade of the pandas and numpy libraries underpinning a wide array of python functions at a financial-sector client.
My how times have changed. Last week feels like a lifetime ago.
Mike and I worked hard on the call graph tool, and now we are very happy with the product. It reads in the python code in a .py file and uses the python parser to create a giant list of all the grammatical elements in the file. This is known as the ast_list. Mike then uses some magic to traverse the ast_list and find all the function calls. Each of these has a variety of metadata associated with it, available in the list, and Mike seeks out this data to complete the record of the function call. He also captures all the import declarations so he can label functions with their parent file if necessary.
We then use python’s getattr function to figure out whether the function belongs to pandas or numpy, in which case we decorate it in the graph to call attention to it.
Then finally we write out the dependencies and any interesting node definitions to a .dot file, then use the dot.exe program (part of the fabulous graphviz library) to put out the final graph in .png format.
A goal for Mike over the next few days is to create a reciprocal list: starting with the pandas and numpy calls being made, list the production code files and modules making the calls. He will put that out from the same code that constructs the call graph, since that has all the required data.
Capturing current behavior
The point, of course, is to show where to look for pandas and numpy interactions, and develop a record of the current behavior of the software prior to changing the version of pandas / numpy. Then we can show the effects of the pandas / numpy upgrade.
To capture the current behavior we are doing something oddly simple. We are using a python capability called decoration to replace the function we want to study – such as a call to the numpy square root (sqrt) function – with a version that does what we want before and after the original function runs while leaving the original function completely unchanged in between. Before running sqrt, we find and save the arguments to the call. After running sqrt, we capture the output.
First we decorate, or wrap, the function. Then we start the client’s legacy code running. The code calls numpy.sqrt some number of times during the run. Each time the function is called the wrapper writes the arguments and results to a file as part of a unit test script. The arguments get fed into the function, and the result is written on the right of an ‘assert function(args) == results’ statement, locking in the expected behavior.
Later, when we are checking to see if sqrt behaves the same as before, we can use nose to run the scripts. Any failures will tell us that the module has changed in a way that the client’s code will care about – we know this because the client’s code wrote our nose scripts for us!
Completing the picture
Now we have developed our proof of concept, and need to start running the production code with enough interesting data to put the pandas / numpy code through all their paces. This comes down to creating a different set of scripts aimed at running the business modules rather than the pandas / numpy modules.
So far we have gotten to the point where we can run production code, specifically one particular function, from the command line, and get different output by passing various input parameters such as date or account name. My goals for next week: to create a call graph while running the production code dynamically (using pycallgraph) for comparison purposes; to write scripts targeting the functions that call pandas and numpy; and to see these scripts drive the creation of new unit test scripts via the wrapper.
Every project needs a couple metrics/heuristics to help steer and to help stop. For this project, we have a simple steering heuristic: keep scripting a particular function while the wrapper functions are producing variety in the unit test scripts; turn your attention to the next function when the scripts stop changing. The stopping heuristic is equally simple: when we have written drivers for all the functions that we know call pandas and numpy, we are done. (We can always make it more complicated by rank-ordering the modules to cover, and working down the list until the client cries Uncle. In fact, that’s inevitably how it will go. But in theory, the stopping heuristic is pretty simple.)
Keep on checking back in. If you all are really good, and if we clean up our code enough that we are comfortable with the scrutiny, we may post some code samples.
PS – there is some controversy around the exact pronunciation of the word “numpy”. Mike rhymes it with “lumpy”, as in oatmeal, whereas I take the more sophisticated stance of “NUM-pie”, rhymes with “Yum – pie.” Does anyone know what the official pronunciation is?