Bibliography | Council for Big Data, Ethics, and Society

View bibliography on Zotero

Acquisti, Alessandro, Ralph Gross, and Fred Stutzman. "Face Recognition and Privacy in the Age of Augmented Reality." Journal of Privacy and Confidentiality 6, no. 2 (December 30, 2014). http://repository.cmu.edu/jpc/vol6/iss2/1

This paper demonstrates the uses of facial recognition as an auxiliary dataset to reidentify anonymous individuals in public spaces and using dating apps. It also demonstrates the ability to use facial recognition to collect "Personally Predictive Information" such as social security numbers from anonymous individuals using by combining face recognition with data mining algorithms and statistical re-identification techniques. This raises significant concerns about the role of data mining as online and offline worlds blend increasingly seamlessly with augmented reality technologies.

Acquisti, Alessandro, Curtis R. Taylor, and Liad Wagman. "The Economics of Privacy." SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, March 8, 2016. http://papers.ssrn.com/abstract=2580411

Synthesizes perspectives from multiple fields to propose an economic theory of privacy, showing the relative value of control and disclosure of private information. Individual disclosers of information are often subject to an asymmetrical power relationship to the economically powerful collectors of information, establishing a fraught ethical terrain.

Ajunwa, Ifeoma. "Genetic Data and Civil Rights." SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, August 14, 2015. http://papers.ssrn.com/abstract=2460897

Examines how the U.S. Genetic Non-Discrimination Act (2009) could not have predicted and regulated certain big-data applications of genetic testing in the workplace. The author shares several helpful examples of how seemingly neutral genetic testing practices lead to racialized disparities in the workplace.

Andrejevic, Mark. "Big Data, Big Questions: The Big Data Divide." International Journal of Communication 8, no. 0 (June 16, 2014): 17

Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. "Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks." ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

This important investigative reporting shows how predictive software used in many U.S. courts’ sentencing and bail proceedings is systematically biased against non-white defendants. Although the predictive algorithm does not directly account for demographic factors when producing a score for likely recidivism, it does use social and economic data that is functionally a proxy for historically-biased social and political relations. Additionally, because the algorithm is treated as a proprietary secret it is not available for public examination or adversarial procedures in court.

Barocas, Solon, and Andrew D. Selbst. "Big Data’s Disparate Impact." SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, August 14, 2015. http://papers.ssrn.com/abstract=2477899

This paper challenges the notion that data mining and machine learning is an inherently unbiased method for decisions making. The authors examine bias and fairness in big data through the lense of US laws that bar discrimination in employment. Because algorithms would appear to outsource decisions to machines—and thus appear to escape the legal requirement that victims of discrimination must be able to demonstrate an intent to discriminate—the best hope for legal relief may come from doctrines of disparate impact. The authors examine the challenges of using antidiscrimination law to address big data and show the challenges of utilizing existing methods to remedy discrimination.

Bowker, Geoffrey C., and Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. MIT Press, 2000

A widely influential collection of essays in critical information theory that centers classificatory systems in the study of technoscience. Illustrates the ethical dynamics of information systems by demonstrating that while classification systems can be deeply oppressive by "torquing" the possibilities of their lives, people are often able to find degrees of agency within the same system that oppresses.

boyd, danah, and Kate Crawford. "Critical Questions for Big Data." Information, Communication & Society 15, no. 5 (June 1, 2012): 662–79. doi:10.1080/1369118X.2012.678878

In this widely cited piece, the authors offer a broad interrogation of big data as a technology, analytical method and a mythology. They provide six "provocations" to inform the growing field of critical data studies: big data changes the definition of knowledge; claims to objectivity and accuracy are misleading; bigger data are not always better data; taken out of context, big data loses its meaning; just because it is accessible does not make it ethical; limited access to big data creates new digital divides.

Brandimarte, Laura, Alessandro Acquisti, and George Loewenstein. "Misplaced Confidences Privacy and the Control Paradox." Social Psychological and Personality Science 4, no. 3 (2013): 340–347

Examines the paradox that giving individuals more control over their personal data tends to lead them to release more sensitive data by increasing their confidence that the collector of the data has good intentions. This shows how some approaches to increasing privacy controls may backfire.

Brunton, Finn, and Helen Nissenbaum. "Vernacular Resistance to Data Collection and Analysis: A Political Theory of Obfuscation." First Monday 16, no. 5 (April 26, 2011). http://firstmonday.org/ojs/index.php/fm/article/view/3493

Calo, Ryan. "Consumer Subject Review Boards: A Thought Experiment." Stanford Law Review Online 66 (September 3, 2013): 97

Proposes the adoption of "Consumer Subject Review Boards" as an IRB-analog for commercial applications of big data.

Cohen, Julie E. "What Privacy Is For." Harvard Law Review 126 (2013 2012): 1904

Crawford, Kate, and Jason Schultz. "Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms." BCL Rev. 55 (2014): 93

Commercial applications of big data effectively skirt privacy regulations meant to protect Personally Identifiable Information (PII). This article proposes an approach to mitigate "predictive harms" through procedural data due process.

"Data & Civil Rights." Data & Civil Rights, n.d. http://www.datacivilrights.org/2014/

Data & Civil Rights is an annual conference that brings together leaders in civil rights, government, industry, and academic research, to address an urgent question: How do we preserve and strengthen civil rights in the face of data-driven technological change?

Data Science Association. "Code of Professional Conduct." Accessed July 14, 2016. http://www.datascienceassn.org/code-of-conduct.html

"Data Science Association, Data Science Code of Professional Conduct," n.d. http://www.datascienceassn.org/code-of-conduct.html

The code of conduct produced by the Data Science Association.

Davis, Kord, and Doug Patterson. Ethics of Big Data: Balancing Risk and Innovation. 1 edition. Sebastopol, CA: O’Reilly Media, 2012

A broad look at ethical issues in big data, with a eye toward business practices.

Diakopoulos, Nicholas. "Algorithmic Accountability Reporting: On the Investigation of Black Boxes." Tow Center for Digital Journalism, 2013. http://towcenter.org/wp-content/uploads/2014/02/78524_Tow-Center-Report-WEB-1.pdf

Algorithms are increasingly important power-brokers in society, but critics and journalists rarely have the tools necessary to unpack exactly how algorithms leverage power over society and individuals. This report details methodological tools for understanding and reverse engineering the input/output relationships in algorithms. The author details the mechanics behind the atomic decisions that algorithms make, including prioritization, classification, association, and filtering.

Duhigg, Charles. "How Companies Learn Your Secrets." The New York Times, February 16, 2012. http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html

Investigative reporting that details how companies compile predictive profiles of customers by correlating disparate data points. These profiles allow the companies to micro-target customers using highly personal information that the customers may not know they have shared.

Dwork, Cynthia, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. "Fairness through Awareness." In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214–226. ACM, 2012. http://dl.acm.org/citation.cfm?id=2090255

This technical paper illustrates mathematical techniques for achieving "fairness in classifications" made in machine learning. The authors identify how algorithms can be trained to follow a fairness constraint so similar persons are treated similarly without causing discrimination or losing the utility of the classification system. The technical insights are applied to the problem of statistical parity in affirmative action admission policies at colleges.

Dwork, Cynthia, and Deirdre K. Mulligan. "It’s Not Privacy, and It’s Not Fair." Stanford Law Review Online 66 (September 3, 2013): 35

This essay examines how algorithmic decision-making relies on classifications systems that are not inherently fair or neutral representations of human behavior or groupings. The authors contend that the best way to understand and regulate machine learning as a sociotechnical system is by giving values an analytic priority over technical concerns—machine-made decisions are ultimately representations about what human programmers value.

Executive Office of the President. "BIg Data and Privacy: A Technological Perspective." Report to the President. The White House, May 2014. https://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf

A report from the Obama Administration examining how privacy can be protected while maximizing the social, economic and scientific utility of big data.

Frank Pasquale. The Black Box Society: The Secret Algorithms That Control Money and Information. Cambridge: Harvard University Press, 2015

Friedman, Batya, and Helen Nissenbaum. "Bias in Computer Systems." ACM Transactions on Information Systems (TOIS) 14, no. 3 (1996): 330–347

Goodman, Alyssa, Josh Peek, Alberto Accomazzi, Chris Beaumont, Christine L. Borgman, How-Huan Hope Chen, Merce Crosas, Christopher Erdmann, August Muench, and Curtis Wong. "The ‘Paper’ of the Future," 2016. https://www.authorea.com/users/23/articles/8762/_show_article

An exploration of how scholarly papers will change with the advent of collaborative writing tools and data storage options.

Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, et al. "Ten Simple Rules for the Care and Feeding of Scientific Data." PLOS Computational Biology, April 24, 2014. doi:http://dx.doi.org/10.1371/journal.pcbi.1003542

Set of guidelines for handling and sharing scientific data.

Graham, Mark, Matthew Zook, and Andrew Boulton. "Augmented Reality in Urban Places: Contested Content and the Duplicity of Code." Transactions of the Institute of British Geographers 38, no. 3 (2013): 464–479

Haklay, Mordechai (Muki). "Neogeography and the Delusion of Democratisation." Environment and Planning A 45, no. 1 (January 1, 2013): 55–69. doi:10.1068/a45184

Hardt, Moritz. "How Big Data Is Unfair: Understanding Unintended Sources of Unfairness in Data Driven Decision Making." Medium, September 26, 2014. https://medium.com/@mrtz/how-big-data-is-unfair-9aa544d739de

Seeks to debunk the idea that machine learning is fair by default by virtue of . Emphasizes how historically unfair social categories are multiply encoded in any adequately rich data space. This occurs because many data points that appear neutral at first look (such as credit scores) are functionally proxies for histories of social, political and economic exclusion (such as redlining encouraged by federal housing policies). Additionally, the disparate sample sizes inherent in modeling minority and majority populations disadvantages the minority population when machine learning is used to make population-wide decisions.

Introna, Lucas D., and Helen Nissenbaum. "Shaping the Web: Why the Politics of Search Engines Matters." The Information Society 16, no. 3 (2000): 169–185

Jackman, Molly, and Lauri Kanerva. "Evolving the IRB: Building Robust Review for Industry Research." Washington and Lee Law Review Online 72, no. 3 (2016): 442

Jackson, Steven J., Tarleton Gillespie, and Sandy Payette. "The Policy Knot: Re-Integrating Policy, Practice and Design in Cscw Studies of Social Computing." In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, 588–602. CSCW ’14. New York, NY, USA: ACM, 2014. doi:10.1145/2531602.2531674

Johnson, Jeffrey Alan. "From Open Data to Information Justice." Ethics and Information Technology 16, no. 4 (2014): 263–274

Kitchin, Rob. "Big Data, New Epistemologies and Paradigm Shifts." Big Data & Society, no. Theory & Ethics of Big Data (June 2014). doi:10.1177/2053951714528481

An examination of how the new epistemological paradigms of data analytics lead to new ethical challenges.

Lazer, David, Alex (Sandy) Pentland, Lada Adamic, Sinan Aral, Albert Laszlo Barabasi, Devon Brewer, Nicholas Christakis, et al. "Life in the Network: The Coming Age of Computational Social Science." Science (New York, N.Y.) 323, no. 5915 (February 6, 2009): 721–23. doi:10.1126/science.1167742

PMID: 19197046
PMCID: PMC2745217.

Lerman, Jonas. "Big Data and Its Exclusions." SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, September 3, 2013. http://papers.ssrn.com/abstract=2293765

Discussions of justice and big data often focus on the risks of inclusion in big data systems—police profiling, redlining, credit scores, etc. This paper examines the justice consequences of exclusion from those same systems. Some socioeconomically disadvantaged populations exist on the periphery of big data ecosystems and thus their preferences and needs are not adequately accounted for in machine learning. The author advocates for a legal/policy principle he names "data antisubordination," which seeks to ameliorate the risk that big data will create group disadvantages.

Lingel, Jessa, and others. "‘Keep It Secret, Keep It Safe’: Information Poverty, Information Norms, and Stigma." Journal of the American Society for Information Science and Technology 64, no. 5 (2013): 981–991

Information poverty occurs when localized conditions result in people not being able to attain desired information reliably, such as people who are homeless or incarcerated. The authors of this paper examine how social stigma around extreme body modification complicate information poverty. People who may not otherwise be "information poor" are not able to find necessary information about extreme body modification practices and safety because their community is structured around secrecy. This study shows how in an era of highly networked information tensions of group membership, secrecy and safety play out in terms of information access and poverty.

Marwick, Alice E., and Danah Boyd. "Networked Privacy: How Teenagers Negotiate Context in Social Media." New Media & Society, July 21, 2014, 1461444814543995. doi:10.1177/1461444814543995

The database structures and user interfaces of social media platforms are centered around the sharing of information, including sharing information about other people. However, many of our assumptions about how privacy functions focus on the individual as the locus of control over information. The authors of this paper examine how teenagers use social media in a networked fashion and advocate for a symmetrical model of networked privacy that treats privacy as a distributed phenomenon.

Metcalf, Jacob. "Letter on Proposed Changes to the Common Rule," December 29, 2015. http://bdes.datasociety.net/council-output/letter-on-proposed-changes-to-the-common-rule/

This public letter represents the Council for Big Data, Ethics and Society’s perspective on proposed changes to the Common Rule. In particular, it focusses on the mistaken assumption that public datasets are inherently low-risk to the subjects of data science research.

Metcalf, Jacob, and Kate Crawford. "Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide." Big Data & Society 3, no. 1 (2016): 1–14. doi:10.1177/2053951716650211

00000.

Metcalf, Jacob, Emily F. Keller, and danah boyd. "Perspectives on Big Data, Ethics, and Society." Council for Big Data, Ethics, and Society, July 7, 2016. http://bdes.datasociety.net/council-output/perspectives-on-big-data-ethics-and-society/

Meyer, Michelle N. "Two Cheers for Corporate Experimentation: The A/B Illusion and the Virtues of Data-Driven Innovation." Colorado Technology Law Journal 13 (2015): 273

00003.

Narayanan, Arvind, Joanna Huey, and Edward W. Felten. "A Precautionary Approach to Big Data Privacy." In Data Protection on the Move, 357–385. Springer, 2016. http://link.springer.com/chapter/10.1007/978-94-017-7376-8_13

00001
Because new auxillary datasets that can be used to potentially be used to re-identify data are constantly released, the risk of re-identification from ad-hoc techniques rises over time. This paper shows how the risks of re-identification are not only unknown, but unknowable. Due to this, the authors advocate for a precautionary approach to privacy protection, placing the burden of proof on the releasing party to account for potential risks. Such a precautionary approach uses policy to incentivize caution with data releases.

Nissenbaum, Helen. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press, 2009

An essential contribution into data privacy theory, law and practice. This book is centered on the author’s theory of "contextual privacy," arguing that privacy is less about control over disclosure of information than about control over the flow of information.

———. "Protecting Privacy in an Information Age: The Problem of Privacy in Public." Law and Philosophy 17, no. 5 (1998): 559–596

Nissenbaum, Helen, and Solon Barocas. "Big Data’s End Run around Anonymity and Consent." In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 44–75. Cambridge University Press, 2014

O’Neil, Cathy. "On Being a Data Skeptic." White paper. O’Reilly Media, 2013. http://www.oreilly.com/data/free/files/being-a-data-skeptic.pdf

Makes the case for a skeptical approach to big data analytics that cuts a middle path between boosterism that ignores scientific limits and cynicism that rejects demonstrable utility of big data. O’Neill identifies the pitfalls of trusting big data too much as: people get addicted to metrics as an easy route to actionable insight; too much focus on numbers, and not enough focus on the behaviors for which the numbers are a proxy; data analytics tends to allow incorrect or irrelevant framing of problems; and using datasets as proxies for real-world social/behavioral conditions enables perverse incentives for gaming the proxies. The pitfalls of not trusting data enough include: big data enables measurement of the value of business decisions with more specificity; big data resources puts "quants" inside of business strategy; big data enables discussion of negative business news without treating it as unwarranted skepticism; missing the feedback loop between models and society.

Pepe, Alberto, Alyssa Goodman, August Muench, Merce Crosas, and Christopher Erdmann. "How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers." PLOS ONE 9, no. 8 (August 28, 2014): e104798. doi:10.1371/journal.pone.0104798

An examination of data sharing practices in astronomy, arguably the most data-intensive field of science. Using an extensive survey at a major research institution, the authors show that although astronomers are by and large philosophically supportive of data sharing, practical and infrastructural barriers limit the extent of sharing large datasets.

Polonetsky, Jules, Omar Tene, and Joseph Jerome. "Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings." Colorado Technology Law Journal 13 (June 4, 2015). https://fpf.org/wp-content/uploads/Polonetsky-Tene-final.pdf

Proposes alternative principles and procedures for research ethics review where the Common Rule and the peculiar practices of IRB’s are not applicable.

Rieke, Aaron, Harlan Yu, and David G. Robinson. "Civil Rights, Big Data, and Our Algorithmic Future." White paper. Upturn. Accessed July 14, 2016. https://bigdata.fairness.io/

This report is a collection of concrete examples of how big data intersects with important civil rights issues, such as credit and insurance access, predatory financial products, and employment background checks. It helpfully illustrates how the civil rights consequences of big data technologies are dependent on the political and legal context. For example, predictive policing could have more negative civil rights outcomes because the algorithms are considered proprietary under IP law and thus are not auditable by experts.

"Santa Clara University’s Markulla Ethics Center Technology Ethics Modules." Santa Clara University’s Markulla Ethics Center Technology Ethics Modules, n.d. https://www.scu.edu/ethics/practicing/focusareas/technology/resources/Students.pdf

Useful pedagogical tools for teaching themes of technology ethics, including Internet and data ethics.

Surden, Harry. "Structural Rights in Privacy." SMUL Rev. 60 (2007): 1605

Despite our tendency to rely on formal laws and regulations to protect privacy rights, much of what actually protects privacy can be understood as latent structural constraints. These constraints operate somewhat like legal constraints by imposing excessive costs on the widespread breaking of implicit societal norms.

Swedloff, Rick. "Risk Classification’s Big Data (R) Evolution." Conn. Ins. LJ 21 (2014): 339

Examines the impact of big data methods on the insurance industry. While algorithmic/machine learning predictive methods may provide more accurate risk distributions in some contexts, it also risks using unexpected and/or irrelevant correlations to impose disparate costs on vulnerable populations and may excessively invade privacy.

Taylor, Alex, Tim Regan, David Sweeney, and Siân Lindley. "Data and Life on the Street." Big Data & Society, 2014. https://www.microsoft.com/en-us/research/publication/data-and-life-on-the-street/

The Leadership Conference. "Civil Rights Principles for the Era of Big Data." Accessed July 14, 2016. http://www.civilrights.org/press/2014/civil-rights-principles-big-data.html

The Social, Cultural, & Ethical Dimensions of "Big Data." New York, NY, 2014. http://www.datasociety.net/initiatives/2014-0317/

On March 17, 2014, the Data & Society Research Institute, the White House Office of Science and Technology Policy (OSTP), and New York University’s Information Law Institute co-hosted "The Social, Cultural, & Ethical Dimensions of "Big Data."" The purpose of the event was to convene key stakeholders and thought leaders from across academia, government, industry, and civil society to examine the social, cultural, and ethical implications of "big data," with an eye to both the challenges and opportunities presented by the phenomenon.

Townsend, Leanne, and Claire Wallace. "Social Media Research: A Guide to Ethics." Economic & Social Research Council, 2016. http://www.dotrural.ac.uk/socialmediaresearchethics.pdf

This report offers a framework for understanding ethical issues in social media research and provides a helpful decision tree for determining what obligations researchers have toward the subjects of data-intensive social science.

Winner, Langdon. "Do Artifacts Have Politics?" Daedalus, 1980, 121–136

This classic essay in the philosophy of technology argues that scholars should not assume technological artifacts are neutral in a sociopolitical sense. Rather, technology always embodies its sociopolitical context and scholarship about technology needs to account for that context.

Wood, David Murakami. "What Is Global Surveillance? Towards a Relational Political Economy of the Global Surveillant Assemblage." Geoforum 49 (2013): 317–326

World-Wide Telescope Project, n.d. https://www.youtube.com/watch?v=d36Ix0uQ1hg&feature=youtu.be

http://wwtstories.org/BonesOfTheMilkyway/bonesofthemilkyway.html
A collaborative astronomy project that uses big data techniques to distribute scientific inquiry.

Wu, Felix T. "Defining Privacy and Utility in Data Sets." U. Colo. L. Rev. 84 (2013): 1117

"Privacy" and "utility" are typically portrayed as having an inverse relationship in big data applications: decisions to increase privacy will typically decrease the utility of the dataset, and vice versa. This paper shows that much of this interpretation relies on the operationalization of "privacy" and "utility" within computer science experiments, which varies widely across the field.