Q&A: Open data portal

Connecticut plans to launch its open data portal in about a month. Several residents, from data scientists to elected representatives, have different perspectives on what the portal means. So we reached out to a few of them to get their take:

Mark Abraham, Executive Director of CT Data Haven
Mark Abraham is executive director of Data Haven, a New Haven nonprofit that compiles and shares community indicator data.

Rep. Roland Lemar, 96th district

Roland Lemar represents the 96th Assembly District, which includes parts of New Haven and East Haven. He is assistant majority whip for the House Democrats and vice chair of the Finance, Revenue and Bonding Committee.

Scott Gaul

Scott Gaul is director of the Hartford Foundation’s Community Indicators Project, a relatively new initiative to gather and analyze community data focused on the workforce and education.

Sheryl Horowitz

Sheryl Horowitz is the community research and evaluation director at the Connecticut Association for Human Services. She went to CAHS to implement “results based accountability evaluation.” She was also co-chair of a data committee for the Early Childhood Cabinet, where they discussed ways to make data public.

Zack Beatty

Zack Beatty runs a civic hackers group in New Haven, which meets occasionally to explore open data. He also works for SeeClickFix, a service that allows citizens to report issues — like potholes — to their municipal governments.

 

How can a state open data portal affect what you do? What your organization does?

Abraham: An open data portal for Connecticut can make it easier for us to routinely obtain updated data on a variety of conditions that impact the people and neighborhoods that we serve. Our organization’s mission is to help interpret, add context to and share this type of information, for example, by cross-tabulating it with data from the Federal government’s many hundreds of existing data portals. By attracting new users, I think that a portal can help foster a statewide culture of data sharing and data-informed decision making, thereby making it easier for state agencies to release better data.

Lemar: It is my hope that the data portal will provide clarity and confidence in the work that our State agencies do, but more, it will provide an opportunity for citizens to look into the impacts that an agency is expected to deliver and measure those results against expectations. It will allow everyone — citizens, policy advocates and public officials alike — to make smarter policy choices.

Gaul: Increasing access to timely, detailed data will help identify areas where we are making progress and help to identify emerging issues for the Hartford region. For a community foundation, a strong open data portal can help us to stay on top of the changing needs for communities within our region.

Horowitz: We monitor and report on indicators that reflect and impact child and family welfare within a lot of the state. Having a central place where data is readily available and updated would make our work easier and possibly more accurate.

Beatty: The mission of New Haven Open Data Meetup is to foster collaboration and learning around “civic hacking” in Connecticut. We’re taking a Big Tent approach to building a community, welcoming developers, designers, technologists, policy wonks, bureaucrats and anyone else who wants to contribute to the Open Data and Open Government movement in Connecticut. With that in mind, having an official Connecticut Open Data Portal will be a boost for us, on several fronts: 1) It will raise the profile of the concepts of Open Data and Open Government locally, hopefully leading to increased membership, 2) It will lower the barrier to entry for our developers who are looking for real data sets to hack on, and 3) It will bring a local context and relevancy to any projects that previously relied on Open Data sets from ex-Connecticut sources.

What, in your mind, would make this portal a success?

Abraham: Generally speaking, the quality of public data improves only when used by the public. If a new portal can help bring more eyes to the state’s administrative data, I think that we’ll see more public requests for high-quality data – by that I mean data that are more timely, more meaningful, more accurate, and also disaggregated (broken down) by neighborhood and demographic group.

Lemar: Clear, easy to use and frequently updated data that is both user-friendly, but also deep enough to allow citizens to truly dig into and understand the work of our State agencies. It should allow fingertip access to answers that a citizen wants about how their government is functioning and where their tax dollars are going.

Gaul: More easily accessible data, faster publication of data, more detailed data or new data would all be successes for the portal. Ideally, the state can use the portal to incentivize improvement on all of these fronts. At the same time, the entire burden for use of the portal shouldn’t fall on the state — we have nonprofits, journalists, advocacy groups, foundations, researchers and other data platforms that can help ensure the data is used accurately and effectively. I’d also love to see Connecticut start to get some recognition for the portal — we should be at the cutting edge for this kind of thing.

Horowitz: If the agencies reporting data were held to a common standard with common reporting conventions and regular updating.

Beatty: I’d like to see the State take the initiative to invite cities and towns to participate in the portal, by offering to federate data between state and municipal sources. This would be a citizen-centric approach that transcends the boundaries of internal political jurisdictions. Our daily lives transcend political administrative boundaries.  

What kind of challenges do you foresee for this portal to achieve that success?

Abraham: There is no such thing as a perfect data set. Administrative data sets, even those from the U.S. Census Bureau, are attempts to measure complex social conditions. These conditions and how they are expressed as data almost always change over time, and data sets themselves are prone to human error. Typically, extensive “metadata” (information about how data are collected) are needed before data will be of real value. Because of these issues, governments are sometimes reluctant to release their most useful data.

Lemar: Achieving the right balance between protecting the privacy needs of some businesses and individuals served by state agencies & the proprietary needs of vendors doing work for the State while also delivering the basic information that citizens deserve to have access to.

Gaul: There have to be incentives in place — both carrots and sticks — for agencies to provide data faster and with more detail. Otherwise, we’re likely to see the initial excitement diminish over time. The Obama administration faced these same challenges with Data.gov, which led to an Executive Order on open data in 2013. A related issue is the availability of listings or schemas for the data housed within individual agencies. That may sound wonky, but those listings provide a map to tell the public what data the agencies collect. Without that, it’s impossible to know if agencies are really making good on the prospects for full transparency.

Horowitz: There needs to be communication between agencies for an agreed upon standard to work. Agencies need to have the resources to produce accurate and timely data.

Beatty: Marketing. The State is going to need to get creative with drawing attention to the portal, and the kinds of data being exposed. Perhaps that’s where our group might be able to help. I certainly would like to see a series of developer hackathons this year, with prizes sponsored by the State. I’m told humans tend to like contests.

The phrase “culture of transparency” has been oft-repeated. But do you believe this portal project really shows a commitment to achieving that culture? Why or why not?

Abraham: Connecticut already posts a mind-boggling amount of state data on the ct.gov website, or otherwise makes it available to the public. A new website in and of itself would not demonstrate a commitment to greater transparency, but if it became a platform that created more data users (both within and outside of government), encouraged agencies to release their highest-value data sets in user-friendly formats, and led to higher-quality data through some of the pathways I mentioned above, then it would be a major step forward in building a more efficient, capable government.

Lemar: The portal, itself, does not illustrate a full commitment to transparency culture. Full transparency goes much deeper and is a constant evolution in data-sharing and open communications between government agencies and the public. This is but a small, yet important step toward increased transparency.

Gaul: It’s definitely a step in the right direction, and there needs to be some first step. Demonstrating the benefits of increased transparency would further support a culture shift. Again, the network of organizations interested in making use of the data can help here.

Horowitz: It depends on what data is included and how it is included. Data suppression needs to be clearly defined and have a solid basis for its use.

Beatty: Launching an Open Data portal is a great first step on the road towards Open Government, which I ultimately see as a perquisite for a functioning (Participatory?) Democracy. Let’s give the state credit for this initiative, it certainly shows the signs of transparency. While CT isn’t the first state to launch such an Open Data portal, we’re not the last. That said, open data is a means to an end. We need to see the movement towards transparency permeate each and every state agency. Beyond that, I think this is going to have a big impact on municipal governments, who might be behind the curve on transparency. The State has a tremendous opportunity to lead from the top on this. I’d like to see this commitment towards transparency followed town-by-town, city-by-city throughout the state.

What data would you like to see on the portal?

Abraham: Connecticut already publishes a significant amount of data by town. However, there is far more variation within each of our state’s towns than between one town and any other. With consideration for privacy, I would like to see all of these existing data sets published at a neighborhood level (i.e., by Census Tract or legislative district), as well as by age group. I would like to see an update to the well-known “Million Dollar Blocks” project – that is, a data set that tabulates spending at our Department of Correction each year, broken down by corresponding neighborhood and type of spending.

Lemar: Any and all data sets that can be made available. The best use of the portal is not to answer a specific question or provide a specific answer, but instead to allow a specific user access to the information that they want.

Gaul: Education data by the town or, better, neighborhood where the student lives. This data is collected and has local relevance. Currently, most education data is released by school and school district. Many students in the Hartford region participate in Open Choice programs or attend magnet or charter schools or other alternatives. For example, by 10th grade, almost half of Bloomfield residents attend magnets, charters, nonpublic schools or Open Choice programs. Consequently, district-level data only provides part of the story and likely understates the performance of many communities. The Hartford Foundation is piloting the Early Development Inventory (EDI) to provide data on kindergarten readiness by neighborhood for Hartford and West Hartford, but the same principle applies to education metrics at all grade levels.

Horowitz: A longitudinal look at the grants that are funded and refunded and the outcomes of these programs.

Beatty: Everything under the sun related to transportation data: from DOT, State Police and DMV. Crash stats, usage, infrastructure projects, budgets, etc. As an avid cyclist and public transit user, I’d like to see 3rd party developer projects that visualize the current state of our transportation infrastructure, with an eye towards a more sustainable and sensible mix of bicycle and rail facilities with the current automobile-centric design. As this all relates to budgets, perhaps this is a good time to give a nod towards our State Comptroller Kevin Lembo, who has already launched an open portal to expose all state financial data, called Open Connecticut.

What would it take for this portal to help legislators make data-driven decisions?

Abraham: There should be features to help users obtain and/or analyze data by state legislative district. This gets to the need for neighborhood-level data, aggregated over longer time periods if necessary to ensure accuracy and privacy, which I mentioned above. Because government tends to publish its data based on citizen need, you often don’t see this level of data quality unless you attract a high level of use, which is something an open data portal can help with.

Lemar: Clean, clear and thorough data, consistently updated and open to all users. Legislators make the best decisions when their constituents are well-informed.

Gaul: Right now, legislators and others have a difficult task using publicly available data to make decisions. Pretend you were a baseball fan, and someone asked you to explain why the Red Sox got so much better last year, but they wouldn’t let you see any individual player’s stats – whether players were traded, injured, got better, worse, etc. All you had to go on were team stats. You know they got better, but why? It’s an almost impossible task with only aggregated data, but that’s the position legislators are in now. In the long run, we should be looking for ways to support data sharing in a protected environment of individual record-level data across systems – that will help to unlock insights you can’t get from aggregated data. Projects like P20-WIN that connect data from early childhood to K-12 to higher ed have the potential to provide this.

Horowitz: See above.

Beatty: I don’t know. I suppose they would need to cross-reference individual bills pending in the General Assembly with any data sources that might be related to the legislation. Perhaps someone could create a mashup of the CGA website data with this Portal data. But wouldn’t any bill-to-data relationships be subjective? I don’t know, it’s an interesting challenge.

What’s your favorite example of government data use — either within government, or by private citizens?

Abraham: I like working with data that serves a day-to-day need. In New Haven, SeeClickFix is heavily used by citizens to report issues related to public spaces. The service functions as our local government’s “311” data system. These data sets allow local officials, such as the new head of our city’s Department of Transportation, Doug Hausladen, to monitor issues in real time, and all the data are open to the public. On their blog is a map we created that cross-tabulates several years of SeeClickFix data with local crime data to look at the relationship of the two.

Lemar: This is tough – so many good examples, but I like what GreyWall Software and SeeClickFix are doing to use government information to improve service provision and emergency management response.

Gaul: Let me give two examples, one positive, one less so:
Baby Name Voyager is fun, useful and presented beautifully. It wouldn’t exist without government data. A more cautionary tale comes from India (the full story is here). In short, government and aid agencies pushed for release of digitized land records in two states. Upon release of the data, most of the data was used by large-scale real estate developers to purchase properties and to push out smaller-scale firms, which was definitely not the intended effect. Examples like this illustrate how data can ’empower the empowered’ and are a reminder that the skills to work with and analyze complex data are not always distributed equitably. We need nonprofits, journalists and citizens to be vocal about the use and interpretation of data, as much as we need government to be active as a provider.

Horowitz: The [U.S.] Census and the American Community Survey.

Beatty: At my day job, I work at civic startup SeeClickFix, makers of a web and mobile platform for reporting non-emergency problems to your local government. Our dataset is 100 percent open and nonproprietary, and we’re the largest ad-hoc Open311-compliant platform in the world. So I love seeing municipal governments take the plunge towards Open311, away from the traditional black box service request tools of the past (phone, email, etc). Once they get past their initial fear of “opening the floodgates,” local governments always value the cost-savings and citizen engagement benefits of an open platform.

Comments

comments