Business Intelligence and Reporting App - Part 1
In this two-part blog series, we will describe the technical aspects of how we created a Business Intelligence and Reporting platform for our clients in manufacturing, retail, etc.
The Goal
The goal of this platform is to let end-users (analysts) write SQL (MS SQL Server, MySQL and PostgreSQL) queries which result in tabular downloadable data-pulls. The analysts can then do further one-off analyses and reporting tasks in their their day-to-day role. These analysts are assumed to be SQL-savvy and our end users are assumed to want to get custom-reporting data in Excel-friendly format for their subsequent analysis. We chose our reference design application set-up with a Django REST API serving a React frontend that uses Bootstrap for UI. More importantly, we are using react-bootstrap-table
to display tabular data, so our API results will need to be easy to plug in there.
The Backend Service
Model
Our queries are defined as a user-defined strings captured in the Report model:
from django.db import models
class Report(models.Model):
name = models.CharField(max_length=255)
primary_key_field_name = models.CharField(
max_length=255,
default="ID",
)
query = models.TextField()
# This could be made into a ChoicesField if you have multiple db connections.
database_connection_name = "my_db"
results = None
Serializers
Next, we need to figure out what format the frontend needs to receive the data in, and make a serializer for that. After perusing the react-bootstrap-table
docs, I came up with 3 serializers:
class Columns(serializers.Serializer):
name = serializers.CharField()
# Use `ReadOnlyField` because we don't know the type of the data
options = serializers.ListField(child=serializers.ReadOnlyField())
class Results(serializers.Serializer):
rows = serializers.ListField(child=serializers.ReadOnlyField())
columns = serializers.ReadOnlyField()
class ReportWithResults(serializers.ModelSerializer):
results = Results()
class Meta:
model = Report
fields = ("id", "name", "primary_key_field_name", "results")
Notes:
Columns
name
: the name of the column, like "id"options
: All of the unique values that appear in this column. We need this because we want to use the "Select" filters from React Bootstrap Table. It requires that we supply a list of the available options to filter by in a given cell.Results
rows
: This is the bulk of the results of the query. The typing of this is pretty open. It could probably be better expressed asserializers.ListField(child=serializers.DictField(child=serializers.ReadOnlyField()))
. Remember thatReadOnlyField
is the "any" type for Django REST serializers.ReportWithResults
A regular DRF model serializer with the added
results
key that holds onto the data from the other 2 serializers.
Querying the db
Now we need to actually run the query against the db. For that, I slightly modified the dictfetchall
function example from the Django docs on using raw SQL so that it returns the columns names in addition to the list of dicts:
def dictfetchall(cursor):
"""
Returns a 2-length tuple:
First element: The column names from the query.
Second element: All rows from a cursor as a list of dicts.
"""
result_column_names = [col[0] for col in cursor.description]
result_rows_as_dicts = [
dict(zip(result_column_names, row)) for row in cursor.fetchall()
]
return result_column_names, result_rows_as_dicts
Performing the query in the view is simple:
class ReportResultsView(APIView):
permission_classes = [IsAuthenticated]
def get(self, request, pk, format=None):
report = get_object_or_404(Report, pk=pk)
try:
with connections[report.database_connection_name].cursor() as cursor:
cursor.execute(report.query)
column_names, rows = dictfetchall(cursor)
except OperationalError as e:
raise ValidationError(
f"Failed to run query against {report.database_connection_name} db. Query was: {report.query}. Error message: {e}."
)
So, we've got our rows
key for the serializer! We just need to make the columns. This part was a bit trickier:
columns = [
{
"name": name,
# options is a sorted list of all unique values that live in a given column
"options": sorted(
list(set(row[name] for row in rows)),
key=lambda el: (el is not None, el),
reverse=True,
),
}
for name in column_names
]
This was mostly straightforward. The really tricky part was the key
function - if you try to run sorted
on a list that contains the None
value, you'll get an error because Python will try to run "my_value" > None
which is not allowed. I had to ask a question on Stack Overflow for this one. The trick is that when you compare tuples, Python compares the values one at a time. So it will first compare the boolean generated by el is not None
. If they are both True
(ie, we are not comparing None
), then Python goes on to the next value in the tuple. Otherwise it stops where it found the difference!
report.results = {
"rows": rows,
"columns": columns,
}
return Response(ReportWithResults(report).data)
That's basically it for the backend. We just need to return and move onto the React side!