About the Candidates data set

Key concepts

In American elections, there are often multiple stages, phases, or dates associated with a single race or contest. The most common pattern is that of a Primary that leads to a General election. The Primary and the General are what we refer to as distinct Stages, while all are part of the overarching Race or contest to determine who will hold that Office for the next term.

Each row in our data set represents either one Candidate’s participation in one Stage, or, in the case of ranked-choice voting, one Candidate, Stage, and ranked-choice round. All candidates for a single contest can be grouped together by their Race ID; the results of any stage can be grouped together by Stage ID; and each candidate (campaign) will have the same Candidate ID for each row of data.

In the example graphic below, Candidate A and Candidate B will each have two rows, while Candidate C will only have one. All those rows will have the same Race ID and each row for each candidate will use the same Candidate ID. The Stage ID and Stage Type will help distinguish the stages from each other.

Common questions

How do I see all the races happening in one state on the same date?

Filter your data set on State and Election date.

Why am I seeing this candidate more than once for the same stage?

This could be due to Ranked-Choice Voting (RCV) or cross-filing. It might also be that the candidate was disqualified under one party and continued to seek the election as a write-in. Look at the two records and see how their parties and status attributes compare.

Why am I seeing a candidate marked as advanced from one stage but no row for their next stage?

If the stage was recently held, the record for their participation in the next stage may be pending creation.

Edge cases and examples

Cross-filing

Cross-filing occurs when a candidate seeks office under more than one party. This is not allowed in all jurisdictions but is quite common in some states, such as New York and Pennsylvania. In these cases, the candidate will have more than one row in the dataset because of their running under multiple parties so long as they appear on the ballot for each party they run under. Imagine a candidate Luis Montega who is running for a local judgeship; he has cross-filed as a Democrat and a Republican.

Montega will have a row for each party’s primary that he participates in as he will appear on the ballot twice in that stage. Though each stage will have its own Stages ID, they will share the same Race ID, and Montega will have the same Candidate ID in all stages. Montega might lose both primaries, advance from only one, or advance from both. If he loses both, he will not have any additional rows (again, excepting if he seeks to participate in the general as a write-in). If he advances under one or both parties, he will have an additional row for the general stage. If he advanced under both, he will be identified as running under both parties in that row.

The graphic above illustrates this for a whole race with cross-filing: Candidate A and Candidate B would have three rows in the dataset and Candidate C would only have one. All those rows will have the same Race ID and each row for each candidate will use the same Candidate ID. The Stage ID and Stage Type will help distinguish the stages from each other.

Fusion voting is a related practice that can lead to a candidate having more than one party associated with them in your dataset.

Ranked-choice voting (RCV)

Ranked-choice voting (RCV) is an electoral system in which voters rank candidates by preference on their ballots. If a candidate wins a majority of first-preference votes, he or she is declared the winner. If no candidate wins a majority of first-preference votes, the candidate with the fewest first-preference votes is eliminated. First-preference votes cast for the failed candidate are eliminated, lifting the second-preference choices indicated on those ballots. A new tally is conducted to determine whether any candidate has won a majority of the adjusted votes. The process is repeated until a candidate wins an outright majority.

Ballotpedia reports each round of these results, so a candidate in an RCV will have a row for each round of calculation in that stage. The first round of results reported will have this attribute = 1. Rounds will count up until the final round so the ultimate outcome will always be the highest round number in the stage.

For example, consider a race with four candidates using RCV. Voters ranked the candidates, and now we are trying to tally up a winner. The first round would look like this:

Candidate

Ranked-choice voting round

Votes for

Votes against

Status

Candidate A

2000

Advanced

Candidate B

1800

Advanced

Candidate C

1500

Advanced

Candidate D

500

Advanced

The candidates receive a status of advanced because they proceed to the next round of results calculation. None of our candidates received a majority of the first round, so the votes are redistributed in a second round. Note that some voters may not have marked a second preference and if their first preference (Candidate D) is disqualified at this point, their vote will not be transferred to any other candidates.

Candidate

Ranked-choice voting round

Votes for

Votes against

Status

Candidate A

100

Advanced

Candidate B

300

Advanced

Candidate C

Advanced

Candidate D

500

Lost

In this second round (and all subsequent rounds), we report the transferred votes, not a vote total for the candidate. In round 2, our disqualified candidate has all their votes transferred away from them and they now have a status of Lost that will be repeated in later rounds. The other candidates each receive votes as they were the second preference of the ballots cast for Candidate D. The vote totals for the candidates in this round (or any given subsequent round) can be found by summing their votes for and subtracting the sum of their votes against from those rounds (ex. Candidate A now has 2,100 votes). None of our candidates have a majority yet so we proceed to one more round where a winner can be determined:

Candidate

Ranked-choice voting round

Votes for

Votes against

Status

Candidate A

100

Lost

Candidate B

700

Won

Candidate C

1575

Lost

Candidate D

Lost

There are some variations on how RCV works. In some multi-seat races (rows in your dataset where the seats up for election is greater than 1), one candidate may meet the majority requirement in an earlier round and be declared won in that stage while the other winners require more rounds of calculation to determine. In other multi-seat RCVs, the rounds may be used to calculate a first winner and then repeated without that winner included to determine additional winners.

Presidential elections

Every four years, the presidential election presents a notable exception, in that it is a series of state-level Races for a single, national Office.

Many candidates file to run for this office with the Federal Election Commission and are given the candidate status of “Candidacy Declared.” However, only a portion of those will go on to appear on the ballot and participate in the state- and party-specific Stages. In these cases, the same candidate will often appear in the data set many times – once for Iowa’s Republican Caucus, once for a third-party primary in Nebraska, once for the Illinois Republican Primary, etc. For those that make it to the general election, that still means they can appear in multiple, distinct Stages – one for each state that they or their political party qualified to run in.

Note, too, that the delegates pledged attribute applies only to the Presidential race.

PreviousDownloading bulk data via API NextData dictionary: Voting rules

Last updated 5 months ago