The case for business intelligence automation
Apr 1, 2024
Nour Lake
With the digitization of business processes, data has become the lifeblood of organizations. Every business activity you engage in, from selling to pushing marketing campaigns or new product features generates data. And every decision you make requires accessing and understanding data. However, the explosion in the number and variety of data sources has made it more difficult to leverage data for decision making.
Native apps analytics
SMBs and startups often start without implementing any kind of BI system. Most (if not all) of their sources are in the form of SaaS applications, so they typically rely on the native analytics provided in their applications. This approach has its merits for companies in the earliest of stages, but quickly becomes very limiting for two main reasons:
They have no control: they can’t build their own metrics and custom reports to account for the specificity of their business model or special situations
They can’t blend data from multiple sources: for instance with separate marketing and product analytics apps, you can’t join data from product and marketing to get a fuller picture of your customer journey
Manual business intelligence reports
After a while, the need for cross-sources analytics grows too strong, as more refined views of business activities are required. Companies often start with the easiest possible path which is to build their business intelligence reports manually.
To build a report manually, an analyst typically goes through these steps:
Extract csv files from SaaS applications and databases
Dump the data in a database or spreadsheet
Model the data with SQL or formulas to build the table required for the report
Build the report/dashboard in a spreadsheet or with a data visualization application
Share with the appropriate business people
This approach has some advantages:
It’s very flexible: you can define advanced custom metrics and reports to adapt to any specific situation for your organization
It’s multi-sources: you can always extract data from your sources of interest and manually join them to get more refined analytics
It’s easy to get started: you don’t have to spend time evaluating and comparing BI solutions or committing to one vendor.
However there are important drawbacks:
It’s error-prone: Updating reports manually is very repetitive and makes it hard to ensure data quality and trust. In 2012, JP Morgan Chase famously lost $6B partially because of an Excel error in their Value-at-Risk model in which an employee had accidentally used a SUM total instead of an AVERAGE. While this is an extreme example, we’ve personally seen many companies with significantly inaccurate reports (some of which were sent to investors) because of errors related to manual processes. This is a serious problem because data is only useful if it’s trustworthy. In many cases having no data is better than having bad data, as making decisions confidently based on flawed data can have terrible consequences.
It’s slow and limited: new data never stops coming and manually updating reports is time consuming. As a consequence, it’s only feasible for quarterly or monthly reports, painful for weekly reports, and downright impossible for daily and intraday reports. In our digital age, the pace of business is ever increasing, and companies that don’t embrace business intelligence automation won’t be able to keep up.
It’s hard to maintain: building reports manually often implies scattering business logic in multiple spreadsheets or SQL queries with duplicated code. When the complexity increases and errors arise they are really hard to trace and debug. In reality this translates into accepting unreliable reports, which hinders the ability to make the best possible business decisions.
It’s not secure: managing security manually is statistically a recipe for disaster. When you send reports manually via emails with no automated process to control access rights of the recipient, you eventually send critical information to the wrong person.
While these drawbacks can quickly become detrimental to the future of a business, a surprising number of companies with dozens of employees still tinker with a manual approach to analytics. This is often because they are extremely focused on their day to day activities, and perceive analytics as an optional distraction with a long-term ROI. This is flawed thinking because data is one of the most powerful levers for growth, and by its very nature it requires to be trustworthy to be useful. The main purpose of data is to bring us some form of truth, and this can only be achieved with proper processes in place.
Pipelines for business intelligence automation
The right approach to business intelligence is to stop thinking in one-off batches and start thinking in terms of pipelines. Data pipelines automate all the processing required to have reliable data with which to build your reports and dashboards. Typically, a data pipeline automatically performs the following steps at regular intervals:
Pull data from multiple sources and centralize it in a single warehouse
Clean up and prepare the raw data tables for modeling
Perform the transforms to model the data to your business needs
Perform tests to ensure data integrity and quality, and trigger alerts if needed
Calculate defined business metrics
Update dashboards and reports
Send alerts/notifications to business users
The advantages to this approach are massive:
It’s reliable: problems like missing values or schema changes always occur when pulling data from multiple external sources, and the only way to ensure data quality is to implement tests. When pipelines are automated their testing is also automated, thereby improving the quality and reliability of the data dramatically compared to a manual process.
It’s scalable: automating pipelines makes it possible for a single analyst to manage hundreds of data pipelines and still get reliable results. This is completely unrealistic with a manual approach.
It’s secure: security, by its very nature, requires a systematic approach that calls for automation. Typically, well designed automated business intelligence pipelines would check if a specific business user has permissions to access/view/download specific data tables to prevent breaches or mishaps. A more refined approach is to implement row level security (RLS) when data rows from a single table are filtered dynamically depending on the user querying the data. The main advantage is to serve many users from a single master table (less maintenance), while still controlling data access.
AI-powered business intelligence
So far we only covered data pipelines, because they are the structural foundations to having any reliable business intelligence. However, the last mile of business intelligence (traditionally limited to dashboards) can greatly benefit from automation through the magic of AI.
In a typical setup, a business person with a specific data-related business question would submit a ticket to the BI team, and get a report/dashboard a few days later to answer her question. This process is painfully slow (it’s often a major bottleneck to becoming truly data-driven), time consuming for the data team, and these reports often end up unused. The key here is that business people want answers on the spot, and not every question should become a dashboard.
AI-powered analytics offer a great solution to this problem. Typically offered in a conversational interface, they empower non-technical people to ask questions and get instant answers without nagging the data team. The beauty of these systems is that they simultaneously democratize access to the data curated by the data team AND they automate insights discovery (to a certain extent). Self-service analytics have been the holy grail of the Business Intelligence industry for decades, and through AI this dream is finally coming true.
However, there is a huge caveat with AI-powered analytics: the Large Language Models (LLMs) which power such applications are prone to hallucinations. Hallucinations are a phenomenon in which generative AI systems provide inaccurate or nonsensical output, going as far as completely inventing answers. This is obviously defeating the initial purpose of BI to improve decision making by leveraging the “truth” found in data, but fortunately there are few approaches that help mitigate this risk:
Controlling the context: this consists in limiting what is fed to the LLMs to only the tables and metrics that are carefully curated by the data team for business purposes. Giving the AI access to your entire warehouse is not a good idea.
Providing metadata: systematically documenting every asset like tables, columns, and metrics is a good practice to build scalable data operations, and it’s even more crucial to provide more relevant context to the AI and prevent hallucinations
Fine-tuning: base models are too generic and can only go so far. As in many other fields, fine-tuning the models to the specific task will yield important improvements.
While it’s easy to fall prey to the shiny object syndrome with AI, making it useful will require proper data operations discipline, and an end-to-end approach to be able to leverage as much context as possible. Disjointed data stack where the different components minimally communicate with each other will create metadata silos that will be detrimental to the deployment of useful AI-powered analytics.
Conclusions
Relying on manual data handling for decision making is predictably unsustainable: it’s slow, prone to errors, and carries hidden costs. Organizations should instead strive to automate data processes for their business intelligence as early as possible, and as much as possible, because the benefits will greatly outweigh the investment. Additionally, the integration of artificial intelligence is immensely promising as it enables swift access to actionable insights for everyone regardless of technical level or data literacy. However, the important risk of hallucination with generative AI must be taken into account. It is mitigated through an integrated approach that feeds the AI a refined context and controls the outcome.