This project implements the following paper:
Eugene Bagdasaryan, Griffin Berlstein, Jason Waterman, Eleanor Birrell, Nate Foster, Fred B. Schneider, Deborah Estrin, 2019 Ancile: Enhancing Privacy for Ubiquitous Computing with Use-Based Privacy, WPES.
Ask questions through GH Issues, our Slack, or just email ([email protected]).
Widespread deployment of Intelligent Infrastructure and the Internet of Things (IoT) creates large quantities passively-generated data. This has ushered in the era of data-rich applications, such as location-based services, while posing new privacy threats. This project explores the challenges that arise in applying use-based privacy to such data. We have developed Ancile, a platform that enforces use-based privacy for applications wishing to access users' personal data. We find that Ancile constitutes a functional, performant platform for deploying privacy-enhancing ubiquitous computing applications.
Ancile supports the development of privacy-aware applications. It acts as a trusted computing environment, ensuring users' data is used only in ways that are compliant with a well defined, use-based privacy policy. Anclie enforces this policy as a middleware layer, sitting between personal data sources (e.g., mail server, location data server, etc.) and third-party applications which wish to utilize such data in a privacy compliant manor. We currently support Python and work with any OAuth service.
Our system allows applications to submit an arbitrary Python program that requests data from Ancile registered data sources. Ancile, upon receiving this program, fetches the policy and access tokens associated with the user and the data source. Ancile attempts to execute the application's program in a restricted environment, enforcing the policies. If the program completes without policy violations the result of the program is returned back to the application.
Use-based privacy (Birrell et al.) focuses on preventing harmful uses (NYTimes) rather than restricting access to data. The application gets to use all necessary data for non-harmful purposes. Each datapoint in Ancile has a policy that specifies what uses are permitted. Furthermore, this framework utilizes reactive approach meaning that after performing transformations on data policy will change.
- Company's data -- data collected by the company's internal services such as emails, location data, etc. Novel third-party applications propose new services such as optimizing workplaces, person/room finders, depression/suicide preventions. However, these services require access to sensitive data, but usually given access is too broad for the needs of the applications. For example, a service that provides information on nearby available rooms does not need constant access to user location data. Unrestricted release of raw data can lead to malicious uses where the user location is accessed after hours or outside of the office. Ancile can address this problem by defining a policy on user's location data that shares data only at specific hours or at the specific location.
We define three roles:
- Admin - responsible for configuring Ancile, approving applications, maintaining user policies
- Application -- needs user's sensitive data
- User -- possesses sensitive information available through OAuth endpoints
Once Ancile is installed we assume the following sample workflow:
- Admin configures Ancile and connects OAuth-enabled data sources
- User registers on Ancile and performs OAuth-authentication with required data sources.
- Application developer registers on Ancile
- User picks a policy associated with the application and connected data source
- Application sends a Python program that requests user's data
- Ancile executes the program with the associated policy and if successful returns the data back to the application otherwise return error.
Policies define an automata that changes on operations with data. For example, applying transformation that fuzzes the location can enable a bigger set of further operations on this data.
Our policy is defined as a regular expression over an alphabet of operations (Python commands) using the following operations:
- Sequence --
commandA . commandB
declares that the program has two callcommandB
only after callingcommandB
. - Union --
commandA + commandB
either of both commands can be invoked. - Intersection --
commandA & commandB
both commands need to match. - Iteration --
commandA*
command can be repeated multiple times. - Negation --
!commandA
can be any command exceptcommandA
.
We use Brzozowski derivatives approach that allows to advance the regular expression when calling a command. Brzozowski defines two key operations: D-step that applies when any command is invoked and E-step that applies only when the application wants to get data back from Ancile.
In Ancile data travels with the policy in a special container: DataPolicyPair. This object is protected using RestrictedPython framework. To obtain data from the user the developer submits the following program:
dpp = fetch_data(user=user('[email protected]'))
That puts fetched data into the object dpp
. The developer can only execute
functions that are allowed by the policy framework. For example, if the policy specifies:
transform.return_to_app
for some commands transform
and return_to_app
then the following program will work:
dpp1 = transform(data=dpp)
return_to_app(dpp1)
Commands return_to_app
are special commands that have to run only in the end of the policy
and if successful Ancile will return data back to the application.
Ancile supports custom functions as well as normal third-party libraries to be controlled
by the policies. All custom functions have to be defined under ancile/lib/
.
We use three different types of functions:
- Fetch functions: annotated by
@ExternalDecorator()
functions can get OAuth token for the user and perform external calls - Transformation functions: annotated by
@TransformDecorator()
functions takeDataPolicyPair
object and return transformedDataPolicyPair
object - Return functions: annoted by
UseDecorator()
functions takeDataPolicyPair
object and return it back if successful.
Beyond these functions we as well support conditional and collection operations that we will introduce later.
Here are the installation Instructions.
We have a development environment running at https://dev.ancile.smalldata.io
so please free to explore it. There are few test accounts set up for exploration.
user/user_password
and app/app_password
.
- Login with app credentials
- Choose app view on the right-top corner
- Click on
Conole
in the left bar - Pick the first app
- Specify user as
user
and press Enter - My user has the following policy:
fetch_location.fuzz_location.return
- Put the following program and click
Run
:dpp1 = indoor_location.fetch_location(user=user('user')) dpp2 = indoor_location.fuzz_location(data=dpp1['location'], mean=0, std=0.2) return_to_app(data=dpp2['sta_location_x'])
- You will get my distorted location.