blue bar
Logic Architect - bringing the power of database to Excel

Data structure

The best ways to organise data were first put forward in the early 1970s and have changed remarkably little since. The use of data analysis, and the resulting data tables, form the start point for every good database system. There is a standard way of splitting the data into separate tables, which is known as normalisation.

Excel, however, has always been different. The start point is an empty grid, and the user can put anything anywhere - there is nothing to lead the user towards any approach rather than any other. What can happen as a result is that data becomes scattered throughout a workbook and is mixed in with the analysis which is the purpose of the workbook. Such workbooks can become very difficult to extend and maintain.

However the advantages of good data design apply just as much to Excel as to any database system:

To give a simple example of how this would work: imagine a table of invoices, and each invoice is associated with a customer. Customers have codes, also names, addresses and other information. One could include all this information on the invoice table, but doing so would result in a great deal of duplication, as one would need to repeat the address etc. on every invoice for the same customer, and there would also be the risk of error in doing it this way. So instead one would just include the customer code on the invoice, then set up a separate customer table with the other customer details.

The end point of this process is data in a number of different tables. Possibly the most significant improvement to Excel was the introduction of support for Tables in Excel 2007, which among other features, for the first time allowed the data to be referenced by its name rather than its address. However there were still limitations - for example the traditional Pivot Table is designed to work from a single Excel table. And although it is easy to set up a formula to lookup a table based on one column, looking up two columns requires a slightly awkward array formula which can be slow with large tables.

Subsequent versions of Excel have improved on this, for example with the introduction of Power Pivot and Power Query, now known as Get and Transform. However these do not always fit easily into existing workbooks or the types of analysis which the user wants to carry out. And to use Get and Transform for anything beyond what is provided in the user interface, the user has to learn the M language which is unlike anything else in Excel.

Data Manipulation Tools

We use a toolset for manipulating tabular data efficiently in Excel VBA which greatly simplifies project development. The software we use is also available for user download on GitHub.

In addition to worksheets and databases, temporary, virtual tables can be created and manipulated in a succession of steps to produce the final desired result. The main features are:

The data transformations provided include:

There is a wide variety of output options. The results can be written back to the workbook as a table, a simple list or fed into a pivot table. Alternatively a lookup function is provided so individual results can be placed in any cell.

Use of SQL

In almost every database system, the main way data is handled is through SQL. And with good reason - almost any transformation of the data can be carried out in a single step. But although it is possible to use SQL with Excel, it is not straightforward as one can not directly reference the active workbook. We also provide a tool to set up the necessary infrastructure so that SQL can be used easily in a variety of ways:

Also, sometimes Excel tables are not adequate for the storage of data and a true database is needed. Please see Database Options for our approach to this.