Tricking Excel's SUMPRODUCT()
Personal note: These days, I'm much more likely to accomplish the same with using a pivot table. These are available in Excel and most other spreadsheet programs such as Google Docs Sheets.
Furthermore, I'm increasingly finding that the Unix philosophy of using structured plain text along with custom scripts to sort, collate, and summarize data meets my needs with far greater flexibility and long-term survival than any proprietary spreadsheet format. (CSV remains incredibly handy as a common interchange that just about any tablular data-handling program will import or export.)
But enough about that. On with the explanation...
Why SUMPRODUCT() is one of the most useful functions in Excel
By far, the most flexible and useful function for Excel formulas that I have found for my purposes is SUMPRODUCT. It is extremely useful, but not particularly obvious. Here I attempt to explain how it works, how it can be tricked into doing cool things with data, and how I use it in two spreadsheets I have used on a semi-daily basis for about ten years.
An Explanation of SUMPRODUCT
Sumproduct multiplies each element of its parameters together and then adds the results.
Example
Given the following table:
A | B | C | |
1 | 1 | 1 | 1 |
2 | 2 | 2 | 2 |
3 | 3 | 3 | 3 |
And a formula that looked like this:
SUMPRODUCT(A:A, B:B, C:C)
You would get back a value of 36. This is derived like so:
First the values in the columns A, B, and C are multiplied together: 1*1*1=1 2*2*2=8 3*3*3=27 Then the row's results (the product) are added (summed) together: 1+8+27=36
You now have the sum of the products, thus sumproduct.
I'm sure there are times when you would want to use the function for exactly this purpose, but I've never encountered one. Instead, I like to trick it into doing cool tests.
Tricking SUMPRODUCT
SUMIF is great when you want to add a bunch of cells together based on the value of that or another cell. But what sucks about it is that it can only test one condition (cell value) at a time. I tried many different ways of forcing multiple conditions on SUMIF, but all were either incredibly awkward or simply didn't work.
By replacing normal column values with conditions, we can get SUMPRODUCT to work like SUMIF without the single condition restriction. The trick is really quite simple and looks like this:
SUMPRODUCT((A:A=B:B)*1, C:C)
The first parameter is (A:A=B:B)*1. It is a condition using the equality operator (=). You can read (A:A=B:B) as "items in column A equal items in column B." As SUMPRODUCT cycles through each row of data, this test will return a TRUE or FALSE value for each test. The *1 portion multiplies the conditional result by 1, which is a short and easy way to turn TRUE and FALSE into 1 and 0.
The second parameter is simply a range in the C column.
All together, the statement can be read as "sum the values in column C where the values in columns A and B are equal." Of course, you could even switch the order of the statements around to read that way, like SUMPRODUCT(C:C, (A:A=B:B)*1)
. My personal preference is to have the condition as in my example, but it's up to you.
Example
Let's see how our conditional trick statement works using another example table:
A | B | C | |
1 | 1 | 1 | 1 |
2 | 2 | 2 | 3 |
3 | 3 | 6 | 5 |
And here is the formula again:
SUMPRODUCT((A:A=B:B)*1, C:C)
Let's see how SUMPRODUCT sees our data.
SUMPRODUCT((1=1)*1, 1) which is (TRUE*1, 1) which is (1, 1) which is 1*1=1 SUMPRODUCT((2=2)*1, 3) which is (TRUE*1, 3) which is (1, 3) which is 1*3=3 SUMPRODUCT((3=6)*1, 5) which is (FALSE*1, 5) which is (0, 5) which is 0*5=0
When added together, the results of the three rows are:
1+3+0 = 4
Multiple Tests with SUMPRODUCT
Once we know how to trick SUMPRODUCT into summing columns of numbers based on one condition, adding more conditions is very simple. We simply add the conditions as additional SUMPRODUCT parameters.
Go from this...
SUMPRODUCT((A:A=B:B)*1, C:C)
...to this...
SUMPRODUCT((A:A=B:B)*1,(A:A=C:C)*1,(A:A=D:D)*1, E:E)
...and beyond. This can be read as "sum the values in column E if column A equals columns B, C, and D."
Matching Specific Values
The above examples are very abstract. Hopefully matching category names will bring this concept home a little better.
Imagine you have the following table of animals (cats and snakes) and their weights:
A | B | C | |
1 | Name | Type | Weight |
2 | Hissy | Snake | 4 |
3 | Paws | Cat | 21 |
4 | Bitey | Snake | 3 |
5 | Claws | Cat | 25 |
6 | Cuddles | Cat | 19 |
Now, let's say we want to get the total weight of all of our snakes. To do this, we simply create a condition based on the "Type" column, B in which we check for the value "Snake". Then we use SUMPRODUCT() to add up the values in the "Weight" column, C:
SUMPRODUCT((B2:B6="Snake")*1, C2:C6)
This gives us the total weight of our snakes: 7.
Getting the total weight of our cats is just a matter of copying that formula and changing the matched value to "Cat" like so:
SUMPRODUCT((B2:B6="Cat")*1, C2:C6)
Which returns 65, as you'd expect. NOTE: if you copy the text of the formula, you'll have no problem. However, if you copy the formula's cell in Excel, you will find that Excel shifts the ranges on you. To prevent this, you can anchor the cells with the '$' symbol like so:
SUMPRODUCT(($B$2:$B$6="Snake")*1, $C$2:$C$6) SUMPRODUCT(($B$2:$B$6="Cat")*1, $C$2:$C$6)
The '$' can be used to anchor either the row B$2
, the column $B2
, or both, as seen above. Now you can copy and paste the formula without fear of losing your carefully-selected range.
Making it Even More Flexible
One thing that can help make this animal weight-counting sheet even better is to use cells specifically for matching. Let me show you what I mean by that with this simple addition to our animal weights sheet:
A | B | C | |
1 | Name | Type | Weight |
2 | Hissy | Snake | 4 |
3 | Paws | Cat | 21 |
4 | Bitey | Snake | 3 |
5 | Claws | Cat | 25 |
6 | Cuddles | Cat | 19 |
7 | |||
8 | Total Weights | ||
9 | Snake | 7 | |
10 | Cat | 65 |
You'll see that I added cells at A9 and A10 which are labeled "Snake" and "Cat". These not only indicate which weights are being totalled, but are also being used to match the animal type in column B. The formulas for the weight totals in column B next to each label are:
SUMPRODUCT(($B$2:$B$6=A9)*1,$C$2:$C$6) SUMPRODUCT(($B$2:$B$6=A10)*1,$C$2:$C$6)
The anchored '$' values aren't strictly nessessary, but they are very helpful. We can now copy and paste these formulas to create totals for new types of animals within seconds. If I wanted to add a total for goats, I would simply copy either the Snake or Cat row and change the text in column A to "Goat". Done!
Real-World Example One: A Budget and Ledger Sheet
Off and on for the last ten years, I have experimented with various forms of personal financial ledger keeping and budgeting. My Excel sheet has evolved over the years to something that is simple but quite powerful. At the core of the sheet is a formula that makes use of conditionals with SUMPRODUCT.
My Budget/Ledger has the following structure:
A | B | C | D | |
... | ||||
3 | Category | Jan | Feb | Mar |
4 | Food | 123 | 142 | 103 |
5 | Water | 15 | 7 | 11 |
6 | Books | 321 | 233 | 156 |
... | ||||
42 | Date | Amount | Category | Description |
43 | 12-Jan-2008 | 13 | Books | cookbook |
44 | 25-Jan-2008 | 2 | Water | 8 gallons |
45 | 3-Feb-2008 | 67 | Food | weekly food |
... |
Everything is in a single sheet. The top section has two important features. Column A holds the ledger item category names. Row 1 holds the abbreviated month names.
The bottom section starting on Row 42 contains the ledger entries for each expenditure. For each entry, Column A holds the date, Column B holds the dollar amount, and Column C holds the category name.
To get the table of monthly values for each category in the top section, I use the following formula:
=SUMPRODUCT( ($C$42:$C$2000=A4)*1, (MONTH($A$42:$A$2000) = MONTH(C$3&"-2008"))*1, $B$42:$B$2000 )
Of course, in Excel I have to have all of that on one line (sigh). This looks like a lot, but it's not too bad when broken down into sections.
The first test condition is $C$42:$C$2000=A4)*1
which can be read as "if the category at entry C42 to C2000 equals the one at A4..." The $ signs make the formula use absolute references so that when I duplicate the formula, it won't shift the references. 42 refers to the first row where ledger entries begin. 2000 is an arbitrary large number I am not likely to reach. You can think of $C$42:$C$2000 as being equivalent to something like "C:C" from the first examples. A4 is a reference to a specific category name from the top section.
The second test condition is (MONTH($A$42:$A$2000)=MONTH(C$3&"-2008"))*1
which can be read as "if the month portion of the date at entry A42 to A2000 equals the month at C3 in the year 2008..." This is very similar to the first condition, but instead of checking the category name, we are checking the entries' dates against a month value created from an abbreviated month name in Row 3.
Lastly, we sum the dollar amounts with the third parameter, $B$42:$B$2000
.
All together in plain English, the formula can be read as, "add the dollar amounts in ledger column B where ledger column A matches the month in row 3 and ledger column C matches the category name in top section column A."
Real World Example Two: My Reading Journal
In 1993 I was given a school assignment to keep a journal of every book I read. I am still keeping that journal today. The journal spreadsheet contains a number of interesting formulas for gathering averages, page counts, etc. SUMPRODUCT came in quite handy when I decided to try creating a chart where each bar represented three months of reading (page count) over the journal's lifetime. It looks something like this:
A | B | C | D | |
1 | Date | Title | Author | Pages |
2 | 12-Feb-2001 | How to Explode | Fink, Rat | 432 |
... |
The first thing I did was start a new sheet and populate it with a column of dates where each date was a leap of three months.
A | B | |
1 | Sep-93 | |
2 | Dec-93 | |
3 | Mar-94 | |
To the present... |
To get a page count for the books read in each three month period, I needed two tests:
- The date from the journal is greater than this date
- The date from the journal is less than the next date
Given that cells from the Journal sheet can be referenced using the "!" operator, the two test conditions look like this:
(Journal!A:A > A1)And
(Journal!A:A < A2)
Put into a formula, it takes shape as:
=SUMPRODUCT((Journal!A:A>A1)*1, (Journal!A:A<A2)*1, Journal!D:D)
In reality, Row 1 of the Journal sheet confused SUMPRODUCT if it was included, so absolute references had to be used and the formula became:
=SUMPRODUCT((Journal!A$2:A$1000>A1)*1, (Journal!A$2:A$1000<A2)*1, Journal!D$2:D$1000 )
Then it was simply a matter of duplicating the formula down the B column alongside the dates I had created...
A | B | |
1 | Sep-93 | 1850 |
2 | Dec-93 | 1243 |
3 | Mar-94 | 1627 |
... |
In the end, I had a nice pair of columns from which to make my bar chart. I was quite pleased.