3 steps to reducing your software performance debt to zero
Justine Bonnot CEO & Founder of WedoLow IA ________________________________________________________________________________________ |
This article details WedoLow’s approach to track software performance debt and correct it, an approach that has allowed our team to reduce the optimization time on an image processing application from 4 weeks (2 engineers full time) to 10 minutes while reducing by 50% the execution time. It explains how a good combination of profiling your software application and processor-related analysis can be used to detect software performance debt and automatically solve it.
Instead of relying on imprecise profiling tools that are not taking into account the hardware target on which the software has to be executed, at WedoLow we use a combination of profiling using a dynamic analysis as well as a thorough analysis of the assembly code produced for the considered hardware target.
Better send your embedded software development team a copy of this article for your own company.
Ready to learn how we do it at WedoLow?
Stay with us, we’ll explain it all ⬇️
STEP 1
First, know your software application (and what happens in it)
- What takes time in your application?
- Are you spending time in data processing (mathematical operations), in memory operations (load, store…) or in control operations?
- Is it logical compared to what you are supposed to do in your application?
A profiler will help you in knowing what happens in your application, but very coarse-grained. It will help in understanding where you have performance bottlenecks in your application, but not in determining if your application spends the most of its time in what it should. To know that, the link between your application and the hardware that will execute it has to be done (hey, assembly code!).
Once you understand what lies in your software application, it is time to understand what is your performance bottleneck. For that, a good profiling tool can be used and can produce:
→ a call graph
→ a flame graph
If you want to know more about the different profiling techniques or the differences between a call and a flame graph ➡️ talk to our experts!
If you know where you spend time in your application, you have to work on it.
If you are focusing on insignificant parts of the time consumption to detect performance debt or optimize it, then even a huge gain won’t be impactful enough to be seen in the real-life of your application.
Choose your battle.
STEP 2
Static analysis but make it fancy
Static analysis is great… but if you are not linking your software with your hardware target, then the information you’ll get about performance will be completely uncorrelated with your real execution.
If the metric that has to be tracked is performance, then it is not readable using only the C/C++ source code. It has to be enriched with assembly information and, when possible, profiling. Even with performing static analysis tools, to track performance, it will lack information. The risk? Getting recommendations that would not be adapted to your specific use case.
To solve this problem, a static analysis technique enriched with assembly and profiling information is key. Then, several information (we call them, “optimization techniques”) can be tracked to check whether your software application has no performance debt regarding your hardware target. A few examples below:
- correct data-types usage
- vectorization (are there possibles zones that could be vectorized but are not automatically by the compiler?)
- correct usage of the instructions of the ISA of your processor
STEP 3
Measure, track your performance progresses and repeat… as early as possible in your software development projects
Measuring and tracking performance debt and complexity of your software application has to be done as early as possible in the development process. Indeed, it has an impact on numerous aspects:
- how you will design an algorithm (and choose between different structures of filters for instance)
- the choice of the quality at the output of your software application (depending on the obtained performance, quality can be traded-off to favor performance by implementing some approximations, reducing the bit-width of some data…)
- the choice of the hardware target (CPU load, memory… do have an impact on the choice of the hardware target. If known very early in the process, it can help in arbitrating between different CPUs).
Numerous software engineers think that they have to tackle performance and its optimization at the end of the development process. It is generally too late, and the risk is to have a high-level algorithm very powerful and giving a high output quality, while being impossible to embed on the selected hardware target. The infinite feedback loops between hardware and software teams, does it sound to you?
If you follow this process, it will guide you and help you tracking the evolution of the complexity of your software application as well as measuring your performance debt.
Hope this process helped! Do not hesitate to let me know your thoughts and maybe your own tips about it