Democrats Unleash "Demzilla" on the GOP

Scott Eden

Business Intelligence Pipeline

Aug 24, 2004

With Election Day little more than three months away, the technology department at the Democratic National Committee is hiring, and evidently their desire to staff up at such a late date has a lot to do with the success of their huge voter and donor tracking system.

About three years ago, the DNC hired Plus Three, a small technology firm that specializes in IT consulting for nonprofit organizations, to help build its system. The decision came at a pivotal moment, not long before the 2002 midterm elections, when the Republican Party had had such a system up and running for some time.

The DNC, meanwhile, had a decrepit internal database running off an AS/400. It had a green-screen terminal interface, and it contained an e-mail donor list of just 70,000 people, said Doug Kelly, the DNC's technology director. "When you think that 50 million people voted for Gore, we did a dismal job."

Many observers, in fact, partly attribute the GOP's state and federal victories in that election to its far more mature, and enormous, database of voters and contributors, known as Voter Vault, about which the party is as tight-lipped as a Langley Cold Warrior.

The DNC is a little less so about its system, which is now Web-based and open-source. The system comes in two pieces: DataMart is essentially a gigantic phonebook of all the country's 166 million registered voters. The goal is to attach key information, or a voter ID, to each of those people — party affiliation, some consumer data, how their home precinct voted, census figures, 306 slices of information in all — and then to mine and model that data in order to perform two functions: entice voters to the booths to vote Democrat, and entice those already converted to fork over cash or, perhaps, to volunteer in some way. Essentially it's a direct-marketing system tweaked slightly for the political realm. The problem, of course, is getting all that key information attached to the names on the DataMart list. There are privacy issues to deal with, for instance, and an enormous amount of research that must be done, so the database remains incomplete.

The second piece is Demzilla, the DNC's internal transactional database, which includes the names of, and key information on, any person or group with which the DNC does business — the Rolodex. Mostly Demzilla is a list of donors, both large and small. But it also includes volunteers, activists, local and state party leaders, and members of the press.

By phone, by direct mail and, mostly, by e-mail, people on the DataMart list are targeted with ads and political messages, tailored as much as possible to that person, based on what the DNC can dig up about their demographic information, their possible pet issues, etc. Should the person contribute or agree to volunteer, into Demzilla goes that name.

Building the system was not an easy project to undertake or complete, especially with the DNC rushing to catch up with its cross-town rival. DNC Chairman Terry McAuliffe, famed for his salesmanship with six-figure donors and the $5000-a-plate set, spearheaded the effort, which largely focuses on small donors, a la and the early Howard Dean primary campaign. "We shamelessly steal stuff that's effective," the DNC's Kelly said. The DNC also had to broker deals with state Democratic organizations, which feed their voter information into DataMart. Quid pro quo, the information collated in DataMart and Demzilla are then used locally by the state party organs. The database effort was part of a $25 million rehab McAuliffe made of the DNC as a whole.

DNC officials will not divulge just how they're able to mine and analyze and drill down into all that data — the BI end of the DataMart/Demzilla system — the one aspect in which they resemble their tight-lipped Republican counterparts. "I'd rather not talk about that," Kelly said. "I can tell you after November third." He said the DNC uses a mix of BI technology developed both in-house and by outside consultants.

Plus Three, a Washington-based firm with about 21 employees, built the system using an open-source software package similar to EBay's or Google's — Linux operating system from Red Hat, Apache Web server, MySQL database and Practical Extraction Report Language — for reasons of both cost and "freedom," said David Brunton, one of Plus Three's founders. Open-source made the most sense, he said, because the DNC wanted to do its own data mining and analytics. Once Plus Three completed the assembly, it could turn over the source code to the DNC's techies, get them up to speed, and let them have at it. In this particular business, open source also has advantages over closed-format, Brunton said, because changes in potential donor targeting often need to be made on the fly — if people are for some reason unwilling, on a particular day, to give out their phone numbers, the DNC could write up some code to deal with that contingency, and implement it almost immediately. The software runs on a typical open-source hardware stack, consisting of AMD servers from Penguin Computing.

As far as the build-out, Brunton said a major challenge was integrating the database to its disparate data sources. Though open-source made the problem easier to overcome than a closed-format system otherwise would have, he said, another obstacle arose: how to make the physical connections between systems fast enough yet stable enough to handle all that data flow — voter information streaming into DataMart (and then into Demzilla, depending on the direct market success) from volunteers knocking on doors and entering survey questions into laptops, or voters clicking through a DNC e-mail. Plus Three also needed to link DataMart to all the far-flung systems used by the state party organizations.

The answer lay in RSS, or "really simple syndication," a feed technology that first took off among bloggers a few years ago. Plus Three developed its own kind of RSS for the DNC, which allowed it to deliver an XML stream between multiple systems. Plus Three's benchmark for a data-transfer rate was 5,000 records per second when those records needed to be parsed (or decoded and transformed into actual data), and 15,000 per second when they did not. "Anything less than that is probably slower than acceptable," Brunton said, "and anything faster is probably too fragile." Another important piece of gear Plus Three used was Spread, the multicasting technology. Information gathered from online transactions might hit one of ten different servers, said Brunton. But a Spread machine allowed Plus Three to then multicast all the logs from those disparate servers, collecting them in one place, and in real time, rather than waiting for an end-of-the-day update. This timeliness is particularly valuable in the fundraising world, said Brunton. "With the ability to raise $5.5 million or $6.6 million in a day, it's important to know where you are in any given hour. It could affect ad buys, or a get-out-the vote effort."

The DNC says that DataMart and Demzilla have enabled the party to increase its number of listed donors from 400,000 at the time of the 2002 elections to "well over a million now," though it won't be more specific. It has also let the DNC cover the costs of prospecting for donations. No longer does it need to pay third-party vendors for lists of target voters, nor must it outsource its various e-mail campaigns. The cost of a very large e-mail blast, in other words, amounts only to the tech staff's payroll.

As good as all this sounds, the viability of the system has been called into question before. About a year ago, an article in Roll Call, the Capitol Hill weekly, quoted an anonymous "consultant," who said, "The system architecture is overly cumbersome and the result is that the data is not easily retrieved ... Worse, the quality of the data is far from a level that would make it immediately useful." Both the DNC and Plus Three vigorously denied this, of course. They say a different kind of politics was at work: sour grapes. The comment, they say, came from a Plus Three rival rejected by the DNC.