Category: Telecom, Media and Entertainment

Data and AI

Discover Our Data & AI Expertise

Modernize Your Data Stack – Move from legacy to real-time lakehouse architectures using dbt, Spark, and Airflow.

Build & Scale AI – Develop ML models with SageMaker and Vertex AI, including edge deployment and compliance.

Operationalize GenAI – Implement LLM copilots, RAG pipelines, and autonomous agents with secure CI/CD workflows.

Strengthen Governance – Embed MDM, anomaly detection, and regulatory alignment (HIPAA, SOC2, GDPR).

Run AI at Scale – Streamline MLOps with tools like Weights & Biases, Arize, and Evidently.

See Proven Impact – 90% faster insights, 70% fewer data issues, $50M+ in savings across 40+ live GenAI use cases.

July 9, 2025
AI at Scale Is Powerful. Without Trust, It’s Dangerous
From Insight to Action: What This POV Delivers
- A structured, six-phase approach to embedding trust into every stage of the AI lifecycle from intent definition to continuous governance.
- A detailed look at how OptimaAI Trust Framework operationalizes fairness, explainability, security, monitoring, and compliance at scale.
- Technical clarity on deploying bias audits, explainable AI tools, prompt-injection defenses, and drift detection in real-world production environments.
- Case studies demonstrating how trust-first AI improves outcomes across industries, reducing churn prediction errors, accelerating contract analysis, and preventing costly outages.
- A blueprint for leaders to transform AI from a regulatory liability into a competitive advantage, unlocking faster adoption, higher ROI, and sustained stakeholder confidence.
July 4, 2025
From Specs to Self-Healing Systems – GenAI’s Full-Stack Impact on the SDLC
From Insight to Action: What This POV Delivers –
- A strategic lens on GenAI’s end-to-end impact across the SDLC , from intelligent requirements capture to self-healing production systems.
- Clarity on how traditional engineering roles are evolving and what new skills and responsibilities are emerging in a GenAI-first environment.
- A technical understanding of GenAI-driven architecture, code generation, and testing—including real-world patterns, tools, and model behaviors.
- Insights into building model-aware, feedback-driven engineering pipelines that adapt and evolve continuously post-deployment.
- A forward-looking view of how to modernize your tech stack with PromptOps, policy-as-code, and AI-powered governance built into every layer.
May 13, 2025
Automated Testing in Telecom: Challenges and How AI Can Help
Author: Razvan Rusu

Gen AI is a very powerful tool that simplifies complex tasks in many areas including the technology field. This article tries to answer the question: Can Gen AI reduce the complexity of testing in telecom?

The short answer is Yes, in multiple ways, but AI won’t do all the work for us.

Mobile telephony is an easy-to-use service with a lot of complexity behind the scenes. Making a phone call is trivial, but this simple operation involves numerous systems and dozens of messages being exchanged. From the initial device authorization to the call end, all these messages are needed.

There are a few reasons why there are so many systems and messages involved:
1. Security
  The communication takes place over an unsecured medium (wireless). Authorization and setting of the encryption keys must be performed before any call/data session. Encryption makes sure nobody can listen to your conversation or see your data transfer. Authorization, on the other hand, makes sure your phone can’t be cloned, which would allow another malicious device to receive or make calls as if it were your phone.
1. Standardization
  The standardization for GSM is done by 3GPP (https://www.3gpp.org/about-us). The main driver for this standardization is interoperability between operators and interoperability between various vendors. An NE, (Network Element) part of the GSM core network, will work the same way for an operator in the United States as for an operator in Indonesia.
  This standardization has some obvious advantages (roaming, for instance—a service we couldn’t live without these days), but it also has some drawbacks. The architecture was split into multiple systems (Network Elements) with clearly defined functionality and message flows. All mobile operators must use these Network Elements in the same way. None of them can decide they don’t like how things are working and choose to handle calls differently, like for instance having a single system performing all the logic. Everyone must stick to what the standard specifies.
1. Mobility and multiple generations of GSM (2G/3G/4G/5G) which must coexist
  We can make calls on a 2G/3G connection or over a 4G/5G connection, depending on the coverage provided by the mobile operator in the area where we are located. The type of connectivity used is not within our control, and we expect consistent behavior for our calls. For instance, we expect to be informed if the called party has been ported to another mobile operator and we expect to be charged the same way regardless of the connection used for making that call.Even more, a call may start on a 4G coverage as a VoLTE call and continue as a 3G call once the 4G coverage is lost. The caller shouldn’t feel this transition as for him it is the same call. However, for the mobile operator, switching from 4G to 3G is a big change that involves multiple systems and messages.
The Challenge

Testing a mobile service is as easy as making and answering a phone call. Or so it seems.

Testing using mobile phones has a few advantages:
1. It doesn’t require any special equipment/system; no investment is needed, as normal GSM phones can be used.
2. It doesn’t require specialized testing personnel. Anyone can use a phone, and the complexity of the systems involved in making a call is not visible during testing.
3. It provides an end-to-end testing, validating the user experience.
This testing method appears to be simple and very effective. Therefore it has been adopted by many mobile operators. Even more, this testing method was automated. Either with specialized equipment or by remotely controlling mobile phones. There are many solutions available for this type of automation.

If this method is effective, automated, and end-to-end, what more could be required? Well, let’s take a closer look at what this method does not cover. First of all, it checks only the edges of the solution. Did we notify all the systems that should have been notified about that call? We can’t say because this is not part of the test.

To make a parallel with testing an online shop: testing if the Place Order function works properly is done solely on the result page seen by the user. Whether the warehouse or the invoicing system was notified about that order is not checked. This would be unacceptable for testing an online shop. So why is it acceptable for mobile operators? We’ll discuss this a bit later.

The second big drawback of this mobile phone testing method is the limitation imposed by the device used for these tests. Several types of tests can’t be executed:
- Roaming tests. The test phone is typically located in-office, within the country of the mobile operator. Therefore, all calls/events initiated from that phone will be national. As a funny side note, I was discussing this problem with the test lead of a mobile operator. She mentioned that when they need to test changes impacting roaming flows, they sometimes drive to the nearest border. It’s a one and a half hour drive, and they must be close to the border at midnight when the maintenance window starts. It’s not something they like or want to do, but there’s no other way they can test roaming scenarios.
- Tests using the reference/test network instead of the live network. In these cases, the device must use the testing infrastructure, which may only be available in dedicated test sites, sometimes even requiring the terminal to be isolated in a Faraday cage.
- International and premium destinations. For international calls, someone needs to answer the call at the other end, which is difficult to do when the device is not under your control. Premium numbers are expensive to call or text, so they are typically skipped in manual or automated testing.
- Long calls. If you have an offering with 2000 national minutes included, testing what happens after these minutes are depleted requires 2000 minutes of testing (~ 33.5 hours). This makes it impossible to conduct nightly tests since they would not finish in time for the following day’s testing.
A new question arises: With all these problems, what makes this testing method so widely adopted? The answer lies in the complexity of the systems involved and the difficulty of having a test team with the required specialized technical knowledge. When running acceptance tests for Network Elements, mobile operators rely on the supplier of that NE. The supplier’s engineers possess the deep technical knowledge, and the mobile operator typically only observes and validates the process, without performing any actual testing themselves.

At the same time, mobile operators focus on testing new functionalities, such as a new voice plan, or a new data offering (e.g. free access to Instagram and TikTok). Regression testing is only seen as a nice-to-have.

The Solution

There isn’t a simple solution. If one existed, it would have been already used by mobile operators. However, this doesn’t mean there is no solution. Since it’s a complex problem, the best approach is to split it. Isolate the complex technical parts from the business-driven parts.

The technical parts hardly ever change in terms of the systems involved and message flows. It must be compliant with the 3GPP standards. So there isn’t a lot of room for creativity. What changes from test to test are the attributes/parameters of the messages. If you have a parametrized module that sends the messages and validates the responses, all you need do is call that module with the right parameter values. You don’t need to know the protocols involved or the specific messages that will be exchanged; the module will handle this complexity for you. This allows the QA team to run proper and complete testing without requiring deep technical knowledge.

For instance, let’s consider the example above. There is a new voice plan where calls are being charged differently. When placing a call, a CAP session triggers a Diameter Ro session towards OCS for 2G calls, or an SIP session which triggers a Diameter session for VoLTE (4G) calls. If you have a module that receives as parameters the originating party (A#), the calling party (B#), and the duration of the call, the QA team doesn’t need to know CAP, SIP, or Diameter, even though the test suite makes use of these protocols.

This separation allows the QA team to focus on testing functionality while simulating and validating the flows and data exchanged at telco-specific protocols. Testing becomes a bit more complicated than making a phone call, but not significantly so. The modules need to be called with the right parameters and their output needs to be validated. This can be done by an orchestrator (for instance a Shell/Python script) that takes input text files in CSV format and outputs the result in CSV format. The CSV format has several advantages:
- It is in human-readable format
- It has a very clear structure
- Can be edited by well-known & used applications, like Excel, where data validation can be added to reduce the risk of human error
Having the test data (input data and expected results) in files opens the door to automation. The test execution can be easily integrated into a CI/CD pipeline. However, there is one additional thing to be considered before declaring the tests automated. The test scenarios need to be executed repeatedly and produce consistent results. They must be idempotent and repeatable to be added to an automated test suite. The steps of an idempotent test are:
1. Setup/configure required data for the test.
2. Execute the test steps.
3. Validate the results.
4. Delete/restore the data modified at step 1.
How can AI help

The success of Generative AI created a lot of hype. Enterprises are increasingly adopting Gen AI across their organizations. Chat GPT and GitHub Copilot have proven able to generate pieces of code and have become very useful tools for software developers.

Can Gen AI be used effectively in testing? Certainly, it can, and there are 2 main areas where it can help. (Note: the use cases presented below are not theoretical; they have been successfully implemented.)
1. Test case generation
  This is considered the Holy Grail of Gen AI in testing – take as input a test plan, or even better the specification document, and generate the test suite. While Gen AI is not yet at this point, just as in the case of software development it can be used by QA engineers to develop faster test cases. The complexity isolation described above is very useful when generating test cases with AI.
  Expecting Gen AI to generate the right messages, in the right order and with the right parameters according to 3GPP is unrealistic. And even if it could, the benefit would be limited as new business requirements don’t modify the 3GPP specifications. However, asking Gen AI to generate CSV files in a specific format with data presented in a natural language is a realistic expectation. For instance, you can give the following prompt to Gen AI: “Verify that a national call of 5 minutes deducts 300 units from NationalSeconds balance” or “A call of 2 minutes to +49123456789 should charge 0.012 EUR from the monetary balance”.
  With some clever prompt engineering, Gen AI will generate CSV lines in the right format. This allows the QA team to focus on what they want to test rather than how the test is going to be conducted. Another benefit is significantly reducing the ramp-up effort required for new team members.
1. Troubleshooting support
  There are situations where it’s crucial to understand the specific details of what went wrong in a test case, especially during regression testing. Most likely, something is wrong, preventing the new release from being deployed into production. But we must also investigate the issue.
  If the problem is related to the business logic introduced by the new release, it may be easier to identify the cause. On the other hand, issues related to telco-specific protocols used during regression testing pose greater challenges, especially when the QA team lacks deep knowledge of these protocols.
  Another scenario where detailed telco understanding is crucial is when developing telco-specific modules. If the QA engineer writes a test that fails, is the failure a test problem or an application problem? The 3GPP standard and the application specifications should provide clarity in such cases. However, in practice, this isn’t always the case. Have you ever tried to read a 3GPP document? To put it mildly, it’s not the most easily readable documentation. The complexity arises because each document references another, which references another, and so on. This complexity, while justified by the technical intricacies of telco standards, can be daunting for newcomers to the field.
  Besides the standards and the project/system-specific documentation, another important source of information for the QA team is the history of tickets previously reported for that project/system. Since, in the telco world, a system is used for many years (often more than 10), these tickets provide valuable information. However, the sheer volume of tickets can be overwhelming, making it difficult, if not impossible, for a QA engineer to determine if a current problem has been previously reported. As a result, new tickets are frequently created, further increasing the number of tickets and decreasing the likelihood of identifying similar or identical issues.
  Gen AI proves to be very useful for this problem. All we need is to create a custom knowledge base that includes:
  - Standards and protocol specifications (3GPP docs)
  - Product and project documentation
  - Tickets reported during the product/project lifecycle (from the ticketing system, e.g. JIRA)
  This way, Gen AI can quickly provide relevant information about a particular situation, indicating which parts of the documents are applicable. This saves hours or even days of digging through standards. Identifying existing tickets similar to the current failure is also extremely valuable, as these tickets include details on how the problem was solved, which might be applicable to the current situation.
  Asking the questions in a natural language makes the adoption of such a solution instantaneous.
Bottom Line

Even though using Gen AI in testing is not yet mainstream, it has already been proven to facilitate the testing process. Thus, I anticipate a gradual but continuous adoption of Gen AI in testing overall, and specifically in telecom testing.
September 13, 2024
ARMed to Entertain: Why the Consumer Electronics Industry loves the ARM microcontroller

Introduction

We live in a world where convenience is king. Millions of electronic devices work in tandem to simplify our lives. The brain in these devices is the microcontroller. Today, we’re going to talk about the ARM microcontroller, which is the heart and soul of consumer electronic devices like smartphones, tablets, multimedia players, and wearable devices.

To start off, there are two main processor architecture designs, namely RISC (reduced instruction set computers) and CISC (complex instruction set computers). ARM is the poster child for RISC, in fact, it is included in its name Advanced RISC Machine.

Its highly optimized and power-efficient architecture makes it indispensable in today’s world. Let’s look at its design in more detail.

A Powerful Brain for Embedded Systems

A mobile or tablet is a shining example of an extremely portable computing device.

It’s a great way to keep your life organized, communicate with practically anyone, consume media content, and enjoy unlimited games and entertainment. These capabilities just keep improving over time.

But there is a silent struggle between applications and the hardware they run on. We all have experienced that annoying lag on our smartphones and not to mention the battery giving up on us when we need it the most. Luckily, ARM is packed with features to help us manage this.

Let’s Talk Simplicity

An ‘assembly instruction set’ is the language understood by the ARM controller. Its design plays a crucial role in enabling us to perform a task in an efficient and optimized manner. ARM has a reduced instruction set (RISC). This does not denote there are fewer instructions available for use. It means a single instruction does less work, i.e., a small atomic task.

As an example, let’s consider adding two numbers that would involve separate instructions for loading, adding, and storing the result using RISC design. Comparatively, a CISC design would have handled all of this in a single instruction. A simple instruction set does not require complex hardware design. This enables an ARM controller design to use fewer transistors and take up less silicon area. This reduces the power consumption, which is critical for battery-operated devices, along with corresponding savings in cost. But RISC controllers need a greater number of instructions to execute a task as compared to CISC. The compiler design for generating machine code from higher-level languages such as C is more complex in this case.

Hence one needs to write optimized code to extract the best performance from ARM.

Dealing with the Energy Vampire

An hour of intense gaming drains your battery and leaves you scrambling for a wall charger or power bank. This is because a lot of computations are done in specially designed hardware units in ARM, which need extra power. These units barely consume any power when your device is idle. This means there is a direct relation between the intensity of computations and energy consumption.

Every microcontroller needs a clock pulse, which is comparable to the heartbeat of the controller. It governs the speed at which instructions are executed and helps the controller keep time while performing tasks or governing the rate at which peripherals are run. The commencement and duration of any action that a processor may perform can be expressed in terms of clock cycles. A lower clock rate reduces the power consumption, which is critical for embedded devices but unfortunately also leads to a drop in performance. An instruction pipeline helps to boost performance and throughput while enabling a lower clock rate to be used. This can be compared to the functioning of a turbocharger in a car engine, where the real saving is in the benefits of using a smaller capacity engine but boosting it to match one that is larger and more powerful.

With careful programming, we can increase the instruction throughput to do a lot more in a single clock cycle. Such judicious use of the system clock preserves battery life, reducing the need to charge the battery frequently.

Busy as a Bee

Another critical feature that speeds up execution is the instruction pipeline. It introduces parallelism in the execution of instructions. All instructions go through the fetch, decode, and execute stages which involve loading the instruction from program memory, understanding what task it performs, and finally, its execution. We have an instruction in each stage of the pipeline at any point in time. This increases throughput and speeds up code execution. Imagine you are at work, and each time you complete a task, your manager has a new one kept ready so that you are never idle. Yes, that would be the perfect analogy for the instruction pipeline. It reduces the wastage of clock cycles by ensuring there are always instructions fetched and available for execution.

A Math Specialist

A core part of computing involves transforming data and making decisions. Speed and accuracy are paramount in such situations. ARM has you covered with hardware units for arithmetic and logical instructions, enhanced DSP, and NEON technology for parallel processing of data. In short, all the bells and whistles needed to handle everything from music playback to powering drone platforms.

The NEON coprocessor is capable of executing multiple math operations simultaneously.

It reduces the computational load on the main ARM controller. The design of these math units allows us to balance the tradeoff between computational speed and accuracy. As per the application requirement, we may choose to perform 4×16 bit multiply operations in parallel via NEON over 4×32 bit multiply operations sequentially in the ARM ALU (arithmetic and logical unit). The precision of the final result is reduced due to the usage of 16 bit operands in NEON, but the change in computational speed is significant. The ability to provide such multimedia acceleration is what makes ARM the main choice for portable audio, video, and gaming applications.

Conclusion‍

We see that the system designers have attempted to balance performance, power consumption, and cost to produce a powerful embedded computing machine. As portability and efficiency demands increase, we can see ARM’s influence continue to expand.

An application, if designed appropriately to leverage all of ARM’s features, can provide stunning performance without draining the battery.

It takes a special level of skill to tune an application in “assembly language,” but the final result exceeds expectations. The next time you see a tiny wearable device delivering unbelievable performance, you know who the hidden star of the show is.

May 17, 2023
Getting the Best Out of FLAC on ARMv7: Performance Optimization Tips
Overview

FLAC stands for Free Lossless Audio Codec, an audio format similar to MP3 but lossless. This means audio is compressed in FLAC without any loss in quality. It is generally used when we have to encode audio without compromising quality.

FLAC is an open-source codec (software or hardware that compresses or decompresses digital audio) and is free to use.

We chose to deploy the FLAC encoder on an ARMv7 embedded platform.

ARMv7 is a version of the ARM processor architecture; it is used in a wide range of devices, including smartphones, tablets, and embedded systems.

Let’s dive into how to optimize FLAC’s performance specifically for the ARMv7 architecture. This will provide you with valuable insight with regard to the importance of optimizing FLAC.

So, tighten your seat belts, and let’s get started.

Why Do We Need to Optimize FLAC?

Optimizing FLAC in terms of its performance will make it faster. That way, it will encode/decode(compress/decompress) the audio faster. The below points explain why we need fast codecs.
- Suppose you’re using one of your favorite music streaming apps, and suddenly, you encounter glitches or pauses in your listening experience.
- How would you react to the above? A poor user experience will cause this app to lose users to the competition.
- There can be many reasons for that glitch to happen, possibly a network problem, a server problem, or maybe the audio codec.
- The app’s audio codec may not be fast enough for your device to deliver the music without any glitches. That’s the reason we need fast codecs. It is a critical component within our control.
- FLAC is a widely used HiRes audio codec because of its lossless nature.
Optimizing FLAC for ARMv7

WHY Optimize for the ARM Platform?
- Most music devices use ARM-based processors, like mobiles, tablets, car systems, FM radios, wireless headphones, and speakers.
- They use ARM because of the small chip size, low energy consumption (good for battery-powered devices), and it’s less prone to heating.
Optimization Techniques

FLAC source code is written in the C programming language. So, there are two ways to optimize.
1. We can rearrange the FLAC source code or write it in a certain way that will execute it faster, as FLAC source code is written in C. So, let’s call this technique C Optimization Technique.
2. We can convert some parts of the FLAC source code into machine-specific assembly language. Let’s call this technique ARM Assembly Optimization as we are optimizing it for ARMv7.
According to my experience, assembly optimization gives better results.

To discuss optimization techniques, first, we need to identify where codec performance typically lags.
- Usually, a general codec uses complex algorithms that involve many complex mathematical operations.
- Loops are also one of the parts where codecs generally spend more time.
- Also, we need to access the main memory (RAM) frequently for the above calculations, which is a penalty in performance.
- Therefore, before optimizing FLAC, we have to keep the above things in mind. Our main goal should be to make mathematical calculations, loops, and memory access faster.
C Optimization

There are many ways in which we can approach C optimizations. Most methods are generalized and can be applied to any C source code.

Loop Optimizations

As discussed earlier, loops are one of the parts where a codec generally spends more time. We can optimize loops in C itself.

There are two widely used methods to optimize the loop in C.

Loop Unrolling –
- Loops have three parts: initialization, condition checking, and increment.
- In the loop, every time we have to test for conditions to exit and increment the counter.
- This condition check disrupts the flow of execution and imposes a significant performance penalty when working on a large data set.
- Loop unrolling reduces branching overhead by working on a larger data chunk before the condition check.
Let’s try to understand by an example:
```
/* Original loop with n iterations. Assuming n is a multiple of 4 */
for (int i = 0; i < n; i++) {
    Sum += a[i]*b[i];
}


/* Loop unrolling by 4 */
for (int i = 0; i < k; i += 4) {
    Sum += a[i]*b[i];
    Sum += a[i+1]*b[i+1];
    Sum += a[i+2]*b[i+2];
    Sum += a[i+3]*b[i+3];
}
```
As you can see, after unrolling it by 4, we have to test the exit condition and increment n/4 times instead of n times.

Loop Fusion –

‍When we use the same data structure in two loops, then instead of executing two loops, we can combine the loops. That way, it will remove the overhead of one loop, and therefore it will execute faster. But we need to ensure the number of loop iterations are the same and the code operations are independent of each other.

Let’s see an example.
```
/* Loop 1 */
for(i = 0; i < n; i++)
{
  prod *= a[i]*5;
}


/* Loop 2 */
for(i = 0; i < n; i++)
{
  sum += a[i];
}


/* Merging two loops to remove the overhead of one loop */
for(i = 0; i < n; i++)
{
  prod *= a[i]*5;
  sum += a[i];
}
```
As you can see in the above code, we are using the array a[ ] in both loops, so we can merge the loops by which it will check for conditions and increment n times instead of 2n.

Memory Optimizations for Arm Architecture

Memory access can significantly impact performance in C since multiple processor cycles are consumed for memory accesses. ARM cannot operate on data stored in memory; it needs to be transferred to the register bank first. This highlights the need to streamline the flow of data to the ARM CPU for processing.

We can also utilize cache memory, which is much faster than main memory, to help minimize this performance penalty.

To make memory access faster, data can be rearranged to sequential accesses, which consume fewer cycles. By optimizing memory access, we can improve overall performance in FLAC.

Fig-1 Cache memory lies between the main memory and the processor

Below are some tips for using the data cache more efficiently.
- Preload the frequently used data into the cache memory.
- Group related data together, as sequential memory accesses are faster.
- Similarly, try to access array values sequentially instead of randomly.
- Use arrays instead of linked lists wherever possible for sequential memory access.
Let’s understand the above by an example:
```
for(i = 0; i < n; i++)
{
  for(j = 0; j < m; j++)
  {
    /* Accessing a[j][i] is inefficient because we are not accessing  array sequentially in memory */
  }
}


/* After Interchanging */
for(j = 0; j < m; j++)
{
  for(i = 0; i < n; i++)
  {
    /* Accessing a[j][i] is efficient now*/
  }
}
```
As we can see in the above example, loop interchange significantly reduces cache-misses, with optimized code experiencing only 0.1923% of cache-misses. This accumulates over time to a performance improvement of 20% on ARMv7 for an array a[1000][900].

Assembly Optimizations

First, we need to understand why assembly optimizations are required.
- In C optimization, we can access limited hardware features.
- In ARM Assembly, we can leverage the processor features to the full extent, which will further help in the fast execution of code.
- We have a Neon Co-processor, Floating Point Unit, and EDSP unit in ARMv7, which accelerate mathematical operations. We can explicitly access such hardware only via assembly language.
- Compilers convert C code to assembly code, but may not always generate efficient code for certain functions. Writing those functions directly in assembly can lead to further optimization.
The below points explain why the compiler doesn’t generate efficient assembly for some functions.
- The first obvious reason is that compilers are designed to convert any C code to assembly without changing the meaning of the code. The compiler does not understand the algorithms or calculations being used.
- The person who understands the algorithm can, of course, write better assembly than the compiler.
- An experienced assembly programmer can modify the code to leverage specific hardware features to speed up performance.
Now let me explain the most widely used hardware units in ARM, which accelerate mathematical operations.

NEON –
- The NEON co-processor is an additional computational unit to which the ARM processor can offload mathematical calculations.
- It is just like a sub-conscious mind (co-processor) in our brain (processor), which helps ease the workload.
- NEON does parallel processing; it can do up to 16 addition, subtraction, etc., in just a single instruction.
Fig-2 Instead of adding 4 variables one by one, neon adds them in parallel simultaneously
- FLOATING POINT UNIT – This hardware unit is used to perform operations on floating point numbers. Typical operations it supports are addition, subtraction, multiplications, divisions, square roots, etc.
- EDSP(Enhanced Digital Signal Processing) – This hardware unit supports fast multiplications, multiply-accumulate, and vector operations.
Fig-3 ARMv7 CPU, NEON, EDSP, FPU, and Cache under ARM Core

Approaching Optimizations

First of all, we have to see which functions we have to optimize. We can get to know about that by profiling FLAC.

Profiling is a technique for learning which section of code takes more time to execute and which functions are getting called frequently. Then we can optimize that section of code or function.

Below are some tips you can follow for an idea of which optimization technique to use.
- For performance-critical functions, ARM Assembly should be considered as the first option for optimization, as it typically provides better performance than C optimization as we can directly leverage hardware features.
- When there is no scope for using the hardware units which primarily deal with mathematical operations then we can go for C optimizations.
- To determine if assembly code can be improved, we can check the compiler’s assembly output.
- If there is scope for improvement, we can write code directly in assembly for better utilization of hardware features, such as Neon and FPU.
Results

After applying the above techniques to the FLAC Encoder, we saw an improvement of 22.1% in encoding time. As you can see in the table below, we used a combination of assembly and C optimizations.

Fig-4 Graphical visualization of average encoding time vs Sampling frequency before and after optimization.

Conclusion

FLAC is a lossless audio codec used to preserve quality for HiRes audio applications. Optimizations that target the platform on which the codec is deployed help in providing a great user experience by drastically improving the speed at which audio can be compressed or decompressed. The same techniques can apply to other codecs by identifying and optimizing performance-critical functions.

The optimization techniques we have used are bit-exact i.e.,: after optimizations, you will get the same audio output as before.

However, it is important to note that although we can trade bit-exactness for speed, it should be done judiciously, as it can negatively impact the perceived audio quality.

Looking to the future, with ongoing research into new compression algorithms and hardware, as these technologies continue to evolve, it is likely that we will see new and innovative ways to optimize audio codecs for better performance and quality.
April 19, 2023
Kickstart your Critical Listening Skills – Learn to Analyze Hi-Res/High Quality Audio with a Spectrogram
Audio is an inherently complex signal.

Anything and everything we hear can be described by an audio signal.

Every sound has its own characteristics, and our ears can isolate and identify them.

High-resolution music has become quite the buzzword these days, but can we identify it simply by listening to it? The audio quality may also vary depending on factors like encoder settings, type of compression, speaker quality, etc. We will interpret this via an audio spectrogram.

What you will learn:
- Why is the frequency domain so important for audio?
- How to estimate the audio quality of a track?
- How to relate what you hear to a spectrogram?
- What does a high-quality musical instrument look like in a spectrogram?
What is an audio spectrogram?

An audio spectrogram is a useful tool for analyzing digital audio, allowing you to visualize and understand how the audio signal evolves over time.

Features like frequency distribution, audio bandwidth, tone quality, etc. can be determined.

High-quality audio at a glance:
- File is large in size as more details are stored.
- Spectrum is spread over a wide range of frequencies.
  eg: High-resolution audio contains frequencies up to192 kHz.
What is time domain?

Observing an audio signal in the time domain gives us an idea of the overall volume of the track and how it varies with time. The loud, soft, and silent components of a track can be easily identified. This, however, does not tell us much about the quality of the instrument being played.

To identify sound quality, we need to look at it from the frequency domain.

Guitar Melody

Violin Chord

‍

What is frequency domain?

Frequencies are the fundamental component of any sound.

High-quality audio contains a wide range of frequencies.

In any time interval, the resultant sound is due to the constructive and destructive interference of multiple frequencies.

Another metric of audio quality is timbre.

Timbre can be called audio flavor or tone. It allows a listener to distinguish between the musical tone of a violin or trumpet even if the tone is played at the same pitch with the same loudness.

The timbre of an instrument is determined by the overtones it emphasizes.

An overtone is any harmonic with a frequency greater than the fundamental.

The composition of a musical instrument’s sound in terms of its partials can be visualized by a spectrogram. Now, let’s try to understand what a spectrogram really is.

Interpreting a spectrogram:

A spectrogram is a heatmap type of visualization of all the frequencies in an audio track.

The higher the energy of a frequency, i.e., the louder it gets, the brighter it looks in the heatmap.

Looking at the patterns of the distribution of frequency energy, we gain valuable insights regarding an audio signal which cannot be identified in the time domain.

2 kHz Sine Wave

3D rendering of a 2 kHz sine wave:

We see a constant energy level in the 2 kHz band.

What do the frequencies tell us?

Frequency patterns relate to the different components of a song. You can identify instruments, vocals, and tunes from lead instruments. The fundamental frequencies can be easily identified as they are the brightest in color in a given interval. The overtones are stacked above with decreasing volume.

Here, we see a single note played on an organ. The fundamental frequency is the brightest while the overtones are stacked above with decreasing volume.

Electric Organ Single Note

3D rendering of a single note played on an organ:

Sound dynamics and control in a spectrogram:

A trained musician can control the loudness and sustain of an instrument, which is a marked element of performance. The bright colors in the spectrum are the loudest frequencies. This will relate to a lead tune that will dominate the sound.

This is a spectrogram of an arpeggio played on a piano.
Piano Slow Melody

How to judge the audio quality of a track?

A wide range of frequencies indicates better audio fidelity.

The sound’s complexity depends on the interference of the overtones.

More overtones symbolize a much richer sound created by musical instruments.

The volume modulation indicates the control and feel a musician is attempting to generate.

Example: spectrogram of a guitar

Guitar Melody

The guitar spectrogram contains frequencies up to 16 kHz.

We can see the notes being plucked in bright colors, with their fundamental frequencies in the range of 512 Hz.

The overtones are stacked in the higher frequency bands with decreasing volume.

A brief pause every couple of notes is also apparent.

We can see the sharpness and sustain of each note being played.

If we look at the fundamental frequency, we can guess the scale.

Example: spectrogram of a violin

Violin Melody

‍

This is a spectrogram of notes played on a violin.

It is softer in sound, smooth, and connected; in musical terms, it’s a clear example of legato.

Softer overtones are visible, which add to the complexity of the sound.

All notes have approximately the same volume and sustain.

Conclusion:

A spectrogram is a great way to try and understand the sounds you hear in the world around you. With one, you will be able to analyze the characteristics of any sound source. Your entire music listening experience will become more intricate and fulfilling.

Do apply the above principles and remember to have fun while doing so. Stay tuned for similar content.

‍
December 12, 2022