[Open-Legislation] Public domain US legal data and code
eric at sunlightfoundation.com
Fri Oct 5 14:57:44 UTC 2012
I've been working for the last month or two with Josh Tauberer (of
GovTrack.us <http://govtrack.us>) and Derek Willis on a project to produce
a public domain scraper and dataset from THOMAS.gov <http://thomas.gov>,
the official source for legislative information for the US Congress.
It's a reasonably well documented set of Python scripts, which you can find
We just hit a great milestone - it gets everything important that THOMAS
has on bills, back to the year THOMAS starts (1973). We've published and
documented <https://github.com/unitedstates/congress/wiki> all of this data
in bulk, and I've worked it into Sunlight's pipeline, so that searches for
bills in Scout<https://scout.sunlightfoundation.com/search/federal_bills/freedom%20of%20information>use
data collected directly from this effort.
The data and code are all hosted on Github on a
organization, which is right now co-owned by me, Josh, and Derek - the
intent is to have this all exist in a common space. To the extent that the
code needs a license at all, I'm using a public domain
that should at least be sufficient for the US (other suggestions welcome).
There's other great stuff in this organization, too - Josh made an amazing
donation of his legislator
and converted it to YAML for easy reuse. I've worked that dataset into
Sunlight's products already as well. I've also moved my legal citation
extractor <https://github.com/unitedstates/citation> into this organization
-- and my colleague Thom Neale has an in-progress parser for the US
to convert it from binary typesetting codes into JSON.
Github's organization structure actually makes possible a very neat
commons. I'm hoping this model proves useful, both for us and for the
Developer | sunlightfoundation.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-legislation