Thursday, January 10, 2013

Open data of U.S. House legislation now available in bulk format

It’s a good week for open government in the United States Congress. On Tuesday, the Clerk of the House made House floor summaries available in bulk XML format. Today, the House of Representatives announced that it will make all of its legislation available for bulk download in a machine-readable format, XML, in cooperation with the U.S. Government Printing Office. As Nick Judd observes at TechPresident, such data is catnip for developers.


This change has been a long time coming, although more needs to be done to fully open the People’s House to the People. In April 2011, Speaker Boehner and Majority Leader Cantor sent a letter to the House Clerk regarding legislative data release. In September 2011, a live XML feed for the House floor went online. In September 2012, Congress launched a beta version of but failed to open the data.

“Thanks to GPO, all House bills for this Congress will be available in one XML file that can be downloaded by anyone,” said Speaker of the House John Boehner, in a statement at ”This is a win for every American who believes in open government. Making legislative data easier to use for third parties, developers, and anyone interested in how Congress is tackling current challenges is a priority for House leaders. We’re going to keep working to make the legislative process more transparent and to better connect lawmakers with the people we serve.”

In a post on Tuesday at, Don Seymour, digital communications director for the Speaker of the House, detailed the progress made during the 112th Congress:

This project is the first of several to be rolled out in the 113th Congress that were coordinated or initiated by the Legislative Branch Bulk Data Task Force. The task force was created to expedite the process of providing bulk access to legislative information and to increase transparency for the American people. It includes the House Clerk, legislative branch agencies such as the Government Printing Office and Library of Congress, representatives from House leadership and key committees, and the House Chief Administrative Officer.

Open government is and has been a priority for House leaders. In fact, the Clerk began offering real-time updates on House floor proceedings in XML back in 2011. The feed of real-time information complemented, a new video streaming feature they set up for desktop and mobile devices. The House also began utilizing new low-cost video conferencing toolsstreaming committee hearings online, working with developers and transparency advocates, and more.

As Speaker Boehner said, this is good news for every American. Despite the abysmal public perception of Congress, genuine institutional changes in the House of Representatives driven by the GOP embracing innovation and transparency have been happening over the last three years.

As Tim O’Reilly observed in 2011, the current leadership of the House on transparency is doing a better job than their predecessors. Jim Harper’s analysis of the government’s data publication process substantiates that progress. Writing at the Cato Institute, Jim Harper praised the House for this step forward:

I believe the public has an Internet-fueled expectation that they should understand what happens in Congress. It’s one explanation for rock-bottom esteem for government in opinion polls. Access to good data help produce better public understanding of what goes on in Washington and also, I believe, more felicitous policy outcomes—not only reduced demand for government, but better administered government in the areas the public wants it.”

…and offered some constructive criticism for improvement:

For now, this data is of limited use because it includes only House bills. The entire oeuvre of congressional bill-writers should be published the same way in the same place so that contrasts and comparisons can be drawn among House and Senate work. In short, why is the Senate not on board?

That I’ve been able to find, the XML is not well documented. What each of the technical codes means is understood by several people in Washington’s transparency community, but the idea is to make it available very broadly, so the documentation should be very strong. The information at should be updated, tightened up, and made easily available to the people gathering bill data on FDsys.

The XML data structures put in bills are limited in terms of what they convey. There is rudimentary information about who introduced and cosponsored bills, what committees they were referred to, and other procedural information. That’s good. But the effects of bills—on agencies, existing law, programs, places—this is not available in machine-readable code. That would be great.

Josh Tauberer, the author of “Open Government Data,” added some caveat’s on the House’s move to bulk bill XML on his blog. Tauberer is the civic hacker behind, which has been scraping and making legislative data more open for years.

In his comments, excerpted below, he notes that there’s no new data here, and thus not the data that the bulk legislative data advocates have been asking for." In other words, this is evolutionary change, not revolutionary change.

What we’re seeing with the bills bulk data project is how the wave of culture change is moving through government. Over the last two years the House Republican leadership has embraced open government in many ways (my 112th Congress recap | the new House floor feed). With this bills XML project, we’re seeing more legislative support agencies being involved in how the House does open government.

This isn’t a technical feat by any means, but it is a cultural feat. The House and GPO worked together to institutionalize a new way for the House to publish bulk data.

Because of the way is managed in the executive branch, we’ve become accustomed to big announcements. The bills bulk data project and the other recent projects show that the House is taking a different approach, an incremental approach, to open government data: publish early and often, gather feedback, then go on to bigger projects. This is something open government advocates have been asking for.

As I mentioned, the tech side itself is not much. They took files they and the Library of Congress already make available (and in some sense already in bulk) and zipped them up into up to 16 ZIP files. (4 files now, but that will probably grow to 16 by the end of the Congress.) So there’s no new data here, and thus not the data that the bulk legislative data advocates have been asking for. But it’s on the road to that. The files involved in this project have the text of legislation but not bill status, which is what the bulk data advocates have been asking for.

As we head into 2013, here’s hoping that the United States Senate follows the lead of the House has taken making itself more accessible to the hundreds of millions of people its Senators represent around the country.

This post has been updated with new links and commentary.


  1. oreillyradar posted this