law.MIT.edu blogstream

Monday, May 2, 2016

Global Legal Technology Laboratory Event

Update: The conference tracks have been announced and will be as follows:

Track 1) Legal document automation
Track 2) Data analytics
Track 3) Semantic systems
Track 4) Smart transactions

The Global Legal Technology Laboratory (GLTL)

On May 5+6, 2016, We are proud to host the Global Legal Technology Laboratory (GLTL) for a day and half of knowledge sharing and creative ideation.

This event is intended to demonstrate the multi-faceted potential of and provide momentum for the growth of the Global Legal Technology Laboratory (GLTL), which has been initiated by the UMKC School of Law, Queen Mary University-London, Brooklyn Law School, and MIT/law as a collaboration of faculty and students at law schools, other academic institutions, and industry collaborators to facilitate the development, building, refining, vetting, testing and improvement of prototype technologies at intersections of law, public policy, and innovation, and generate data and data analytics processes to inform research on barriers to and facilitators of innovation and entrepreneurship.

Saturday, October 24, 2015

FutureCommerce Hackathon

Wednesday, March 25, 2015

Emerson and Connection Science

Facts are facts, or so the saying goes, but some facts are different. In a sense, all facts are also undifferentiated raw data. Yet some factual information is legally considered personal data and a variety of protections, rights, obligations and other expectations may apply. Other facts may be considered "Public Record" and even published freely on the Internet as "Open Data".

The nature and definition of public and private facts is highlighted in a simple and powerful way by Emerson:

"In like manner all public facts are to be individualized, all private facts are to be generalized. Then at once History becomes fluid and true, and Biography deep and sublime." Emerson, Ralph Waldo. Essays - First Series.

In a very literal sense, the personal data of an individual is "generalized" when de-identified and aggregated with a lot of other like data. This process is common and considered a basic pillar of privacy protection.

Distinguishing public from private facts can, in some cases, be challenging. And generalization methods for protecting private personal information have serious limits as more and more options emerge for re-identification.

Nonetheless, the phrasing of this basic principle by Emerson offers fresh perspectives for potential new approaches to fair information management. A system that individualized public facts well would presumably have collected and rendered usable much related data in a way that sheds light on individual context and the surrounding individual situation.

Individualization of data goes inward and outward. Example of individualizations that go inward and deeper, such as by including specific identity attribute data of a user account or data subject. Adding ever finer grained details of the data itself, such noting the precise file size of attaching cryptographic hash digests that are unique to that single blob of data, can individuate an otherwise identical file from all other files.

Perhaps more important than the inward detailing is the outward individualization of public facts because it can provide social meaning and more cues for business, legal and technical understanding of the facts in their actual context. Including or linking to external data such as of the as timestamps at the nanosecond scale or fine grained location data. Absolute location (eg latitude and longitude) itself can be significantly fine tuned and detailed with proximity data, eg from nearby Bluetooth connection attempts, RFID readers and wireless hotspot routers. Information about business or legal context of a given piece of data likewise provides significant individualization. Documenting the owner of the device that created the data, the project that collected the data or the entity that funded the activity that created the data might allow a citizen at a public data portal to find and understand transportation, educational or other types of data in a much more meaningful and valuable way.

Public facts can be individualized by including as metadata or links in documentation associations with other data shedding light on the relevant surrounding circumstances. These associations can be discovered and computed by software when presented in a structured standard manner. Individualization of public facts published as open data can be powerfully achieved by noting the key people and interactions relevant to the data in a standard way. Linking data to relevant people and transactions allows all subsequent users of the data to sort, filter and search based on widely varied flexible sets of factors revealing broader situations and intertwining circumstances.

Emerson may have unknowingly offered a very timely and important functional capability goal for the current wave of open data adoption. As cities, states and federal agencies push more and more data out, anchoring to the axiom as stated by Emerson may provide the missing design requirements needed to build out systems that are both fair and functional. Generalizing and appropriately obfuscating private facts such as personal data is a must and always worth repeating. The balancing parallel admonition to render public facts individualized completed a coherent and buildable design requirement and can serve as is a kind of axiom for those investing in deployment of large scale open data projects around the world.

Ensuring open data about public facts includes documentation and links to other data needed to relate it to other relevant individual data individualizes it. This is fundamental to make data usable, understandable and of value because it enables others to find, filter and sort and cross tabulate any combination of relevant data.

Individualizing data about public facts enables connections to context, meaning, insight and value.

Individualizing open data is fundamentally a matter of Connection Science.

Monday, February 23, 2015

Open API and Public Apps as Law and Policy for Innovation and Competition

MIT City Council member Ben Kallos advocated use of an Open API (enabling many innovative, competitive and different apps) and a new "E-Hail" NYC branded public App to provide an accessible, fair, free baseline service to anchor a growing dynamic economic marketplace of mobile apps for transportation services. This creative approach could provide the basic formula for a method of law and policy making that can work better for the information age not only in the transportation and other markets as well.

Tuesday, February 3, 2015

Public Data Time Series of US Code

Use the above code to explore the US Code draft data set provided by law.MIT.edu.

Virtualized Organizations

In order to successfully complete the transition to digital business in networked economies, key legal, financial and operational aspects of organizations remain to be brought online in a universalizable manner.

This video presents a conceptual approach to designing totally virtualized (distributed and automated) organizations:

This video discussion with leaders of the Inquiri and Loomio collaboration platforms explores opportunities and options for bringing deliberation and decision making functionality available as a common and reusable capability expressed as a REST service:

This GitHub repository explores an approach to prototyping and testing a "Distributed Autonomous Organization" (DAO) using the type of blockchain underpinning BitCoin:

Toward Reproducible Computational Law

Reproducibility will be an especially critical component of Computational Law, given the underlying context of rights, responsibilities, requirements, recourse and remedies.

This series of posts is intended to raise awareness, catalyze and foster development of a common understanding about how and when to create and preserve documentation needed to achieve reproducible Computational Law methods and results.

Professor Victoria Stodden has articulated the case for reproducibility as a key requirement for computational science generally. Professor Stodden's recent book "Implementing Reproducible Research" (see below) advocates:

"In computational science, reproducibility requires that researchers make code and data available to others so that the data can be analyzed in a similar manner as in the original publication. Code must be available to be distributed, data must be accessible in a readable format, and a platform must be available for widely distributing the data and code. In addition, both data and code need to be licensed permissively enough so that others can reproduce the work without a substantial legal burden."

Professor Stodden has informally been kind enough to offer valuable advice about the scope and direction of law.MIT.edu prior to launch and been an important collaborator on a flagship book on Big Data Privacy. As a start toward building a coherent framework for reproducibility of Computational Law we will explore the following materials by Professor Stodden:

Data and Code Sharing in Bioinformatics: From Bermuda to Toronto to Your Laptop

Abstract

Large-scale sequencing projects paved the way for the adoption of pioneering open data policies, making bioinformatics one the leading fields for data availability and access. In this talk I will trace the history of open data in -omics based research, and discuss how open code as well are data are being addressed today. This will include discussing leading edge tools and computational infrastructure developments intended to facilitate reproducible research through workflow tracking, computational environments, and data and code sharing. [March 13, 2014 talk by Professor Victoria Stodden, Department of Statistics, Columbia University. The talk was given at UC Berkeley.]

Implementing Reproducible Research

Summary

Implementing Reproducible Research covers many of the elements necessary for conducting and distributing reproducible research. It explains how to accurately reproduce a scientific result.

Divided into three parts, the book discusses the tools, practices, and dissemination platforms for ensuring reproducibility in computational science. It describes:

Computational tools, such as Sweave, knitr, VisTrails, Sumatra, CDE, and the Declaratron system Open source practices, good programming practices, trends in open science, and the role of cloud computing in reproducible research.

Chapters are fully reproducible with material available on the editors’ website.
Covers the three principal areas of reproducible research: tools, practices, and platforms
Contains contributions from leading figures in the field
Explores the use of reproducible research in bioinformatics and large-scale data analyses
Provides case studies and advice on best practices and legal issues, including recommendations of the Reproducible Research Standard

Software and methodological platforms, including open source software packages, RunMyCode platform, and open access journals

Each part presents contributions from leaders who have developed software and other products that have advanced the field. Supplementary material is available at www.ImplementingRR.org.

Open Data and Reproducibility in Research: Victoria Stodden @ OpenCon 2014 Talk delivered November 15th, 2014 at OpenCon 2014, the student and early career researcher conference on Open Access, Open Education, and Open Data.

OpenCon is organized by the Right to Research Coalition, SPARC (The Scholarly Publishing and Academic Resources Coalition), and an Organizing Committee of students and early career researchers from around the world. [This video was published by the The Right to Research Coalition]

Reproducing Statistical Results

Talk given October 23rd, 2014 to the Berkeley Initiative for Transparency in the Social Sciences (BITSS) on framing transparency in research in historical context and across various disciplines.

Toward Reliable and Reproducible Inference in Big Data

Abstract The 21st century is surely the century of data. New technologies now permit the collection of data by virtually all scientific, educational, governmental, societal and commercial enterprises, leading to not only an explosion in the amount of data but also in its diversity, complexity, and velocity. The opportunities for information and knowledge extraction from these data are enormous, however they present new challenges to reproducibility and verifiability. In this talk I will outline issues in reproducibility in the big data context and motivate both technical and nontechnical solutions. I will present ResearchCompendia.org, a tool I have been collaboratively developing to both persistently associate data and code with published findings, and verify those findings. I will also present recent empirical research intended to illuminate data and code sharing practices and inform policy steps to enable really reproducible research. [This talk was delivered December 1st, 2014 to the NCSA Colloquium]

We expect to add more background and resource links on this topic. You are invited to share your comments, questions, ideas or helpful background resources.