The case for business intelligence automation

Apr 1, 2024

Nour Lake

Hand building a conceptual workflow over blue background

With the digitization of business processes, data has become the lifeblood of organizations. Every business activity you engage in, from selling to pushing marketing campaigns or new product features generates data. And every decision you make requires accessing and understanding data. However, the explosion in the number and variety of data sources has made it more difficult to leverage data for decision making.


Native apps analytics

SMBs and startups often start without implementing any kind of BI system. Most (if not all) of their sources are in the form of SaaS applications, so they typically rely on the native analytics provided in their applications. This approach has its merits for companies in the earliest of stages, but quickly becomes very limiting for two main reasons:

  • They have no control: they can’t build their own metrics and custom reports to account for the specificity of their business model or special situations

  • They can’t blend data from multiple sources: for instance with separate marketing and product analytics apps, you can’t join data from product and marketing to get a fuller picture of your customer journey

Business people check in-app analytics directly


Manual business intelligence reports

After a while, the need for cross-sources analytics grows too strong, as more refined views of business activities are required. Companies often start with the easiest possible path which is to build their business intelligence reports manually.

To build a report manually, an analyst typically goes through these steps:

  • Extract csv files from SaaS applications and databases

  • Dump the data in a database or spreadsheet

  • Model the data with SQL or formulas to build the table required for the report

  • Build the report/dashboard in a spreadsheet or with a data visualization application

  • Share with the appropriate business people

Building BI reports in spreadsheets from CSV extracts from apps


This approach has some advantages:

  • It’s very flexible: you can define advanced custom metrics and reports to adapt to any specific situation for your organization

  • It’s multi-sources: you can always extract data from your sources of interest and manually join them to get more refined analytics

  • It’s easy to get started: you don’t have to spend time evaluating and comparing BI solutions or committing to one vendor.

However there are important drawbacks:

  • It’s error-prone: Updating reports manually is very repetitive and makes it hard to ensure data quality and trust. In 2012, JP Morgan Chase famously lost $6B partially because of an Excel error in their Value-at-Risk model in which  an employee had accidentally used a SUM total instead of an AVERAGE. While this is an extreme example, we’ve personally seen many companies with significantly inaccurate reports (some of which were sent to investors) because of errors related to manual processes. This is a serious problem because data is only useful if it’s trustworthy. In many cases having no data is better than having bad data, as making decisions confidently based on flawed data can have terrible consequences.

  • It’s slow and limited: new data never stops coming and manually updating reports is time consuming. As a consequence, it’s only feasible for quarterly or monthly reports, painful for weekly reports, and downright impossible for daily and intraday reports. In our digital age, the pace of business is ever increasing, and companies that don’t embrace business intelligence automation won’t be able to keep up.

  • It’s hard to maintain: building reports manually often implies scattering business logic in multiple spreadsheets or SQL queries with duplicated code. When the complexity increases and errors arise they are really hard to trace and debug. In reality this translates into accepting unreliable reports, which hinders the ability to make the best possible business decisions.

  • It’s not secure: managing security manually is statistically a recipe for disaster. When you send reports manually via emails with no automated process to control access rights of the recipient, you eventually send critical information to the wrong person.

While these drawbacks can quickly become detrimental to the future of a business, a surprising number of companies with dozens of employees still tinker with a manual approach to analytics. This is often because they are extremely focused on their day to day activities, and perceive analytics as an optional distraction with a long-term ROI. This is flawed thinking because data is one of the most powerful levers for growth, and by its very nature it requires to be trustworthy to be useful. The main purpose of data is to bring us some form of truth, and this can only be achieved with proper processes in place.


Pipelines for business intelligence automation

The right approach to business intelligence is to stop thinking in one-off batches and start thinking in terms of pipelines. Data pipelines automate all the processing required to have reliable data with which to build your reports and dashboards. Typically, a data pipeline automatically performs the following steps at regular intervals:

  • Pull data from multiple sources and centralize it in a single warehouse

  • Clean up and prepare the raw data tables for modeling

  • Perform the transforms to model the data to your business needs

  • Perform tests to ensure data integrity and quality, and trigger alerts if needed

  • Calculate defined business metrics

  • Update dashboards and reports

  • Send alerts/notifications to business users

Pipelines transforming data from business apps into dashboards for business people


The advantages to this approach are massive:

  • It’s reliable: problems like missing values or schema changes always occur when pulling data from multiple external sources, and the only way to ensure data quality is to implement tests. When pipelines are automated their testing is also automated, thereby improving the quality and reliability of the data dramatically compared to a manual process.

  • It’s scalable: automating pipelines makes it possible for a single analyst to manage hundreds of data pipelines and still get reliable results. This is completely unrealistic with a manual approach.

  • It’s secure: security, by its very nature, requires a systematic approach that calls for automation. Typically, well designed automated business intelligence pipelines would check if a specific business user has permissions to access/view/download specific data tables to prevent breaches or mishaps. A more refined approach is to implement row level security (RLS) when data rows from a single table are filtered dynamically depending on the user querying the data. The main advantage is to serve many users from a single master table (less maintenance), while still controlling data access.


AI-powered business intelligence

So far we only covered data pipelines, because they are the structural foundations to having  any reliable business intelligence. However, the last mile of business intelligence (traditionally limited to dashboards) can greatly benefit from automation through the magic of AI.

In a typical setup, a business person with a specific data-related business question would submit a ticket to the BI team, and get a report/dashboard a few days later to answer her question. This process is painfully slow (it’s often a major bottleneck to becoming truly data-driven), time consuming for the data team, and these reports often end up unused. The key here is that business people want answers on the spot, and not every question should become a dashboard.

AI-powered analytics offer a great solution to this problem. Typically offered in a conversational interface, they empower non-technical people to ask questions and get instant answers without nagging the data team. The beauty of these systems is that they simultaneously democratize access to the data curated by the data team AND they automate insights discovery (to a certain extent). Self-service analytics have been the holy grail of the Business Intelligence industry for decades, and through AI this dream is finally coming true.

However, there is a huge caveat with AI-powered analytics: the Large Language Models (LLMs) which power such applications are prone to hallucinations. Hallucinations are a phenomenon in which generative AI systems provide inaccurate or nonsensical output, going as far as completely inventing answers. This is obviously defeating the initial purpose of BI to improve decision making by leveraging the “truth” found in data, but fortunately there are few approaches that help mitigate this risk:

  • Controlling the context: this consists in limiting what is fed to the LLMs to only the tables and metrics that are carefully curated by the data team for business purposes. Giving the AI access to your entire warehouse is not a good idea.

  • Providing metadata: systematically documenting every asset like tables, columns, and metrics is a good practice to build scalable data operations, and it’s even more crucial to provide more relevant context to the AI and prevent hallucinations

  • Fine-tuning: base models are too generic and can only go so far. As in many other fields, fine-tuning the models to the specific task will yield important improvements.


Feeding curated tables, business metrics, and metadata to an LLM to power a chat


While it’s easy to fall prey to the shiny object syndrome with AI, making it useful will require proper data operations discipline, and an end-to-end approach to be able to leverage as much context as possible. Disjointed data stack where the different components minimally communicate with each other will create metadata silos that will be detrimental to the deployment of useful AI-powered analytics.


Conclusions

Relying on manual data handling for decision making is predictably unsustainable: it’s slow, prone to errors, and carries hidden costs. Organizations should instead strive to automate data processes for their business intelligence as early as possible, and as much as possible, because the benefits will greatly outweigh the investment. Additionally, the integration of artificial intelligence is immensely promising as it  enables swift access to actionable insights for everyone regardless of technical level or data literacy. However, the important risk of hallucination with generative AI must be taken into account. It is mitigated through an integrated approach that feeds the AI a refined context and controls the outcome.

Ready to become

data-driven?

Ready to become

data-driven?