related_work.tex

\section{Related Work}
\label{sec.related_work}
This section summarizes other kernel security approaches, such as virtualization
 or other isolation mechanisms
that aim to ensure the safety of privileged code in user space and kernel space.
%\lois{Justin and Yanyan: In reviewing this section, consider re-organizing this
%literature around how these sources relate to the function of Lind rather than
%just techniques used.}

%\lois{Do all of your subtopics fall into this one category? If not,

\textbf{Safety Metrics.}
Earlier research metrics have suggested a number
of approaches to quantifying risk in the OS kernel.
Palix found that \texttt{drivers} contains the
most faults in the Linux kernel, in addition to HAL and \texttt{fs}~\cite{palix2011faults}.
In~\cite{engler2001bugs} Engler analyzed system code by static analysis
to look for contradictions, such as acquired locks that are
not released.


%Studies have also been conducted based on the number of functions and/or the number of executable lines of code in the function~\cite{Mining-Metrics}.
%\lois[Isn' this a bit repetitious]

Coverage-based fault localization assumes that execution events that occur mostly in failing
test cases, but rarely in passing test cases, are more \textit{suspect}
of being the fault~\cite{jones2002visualization}. Further research in the field lead to measuring the set
difference of the statements covered by passing and failing test scenarios with run-time events~\cite{agrawal1995fault, jones2005empirical}.
Further research shows that statistical likelihood test case with a predicate that evaluates to True
is correlated with failure~\cite{liblit2005scalable}.
%\lois{This sentence is somehow incomplete}

%While these metrics can give insight on the possible places where bugs have been located in software systems, they have limitations.
%Firstly, the metrics mentioned above are not specifically designed for studying bugs in OS kernels.
%\cappos{Why does this matter?}

%Therefore, they do not take into account the way system calls are invoked by user applications.
%Moreover, these metrics can not provide information about the accurate location of the bugs within the kernel, which is important to our study.
%\cappos{Are you going to show that this is true?  Can you explain more about why these do not work?}
%\cappos{Do existing metrics work on a LOC level?  }

Prior work has also looked at the evolution of safety defects. For example, 
Ozment et al.~\cite{ozment2006milk} and Li et al.~\cite{li2006have} have 
both studied how the numbers of defects and security issues evolved over 
time. Ozment reported a decrease in the rate at which new vulnerabilities 
are reported by studying the age of vulnerabilities in OpenBSD. Li 
reported an increase in security bugs by studying the number and relative 
percentage of bugs. However, neither of these two approaches allowed 
an investigation on how vulnerabilities are distributed. 
%Our work is motivated by Schroter et al.~\cite{schroter2006predicting} and 
%Neuhaus et al.~\cite{neuhaus2007predicting}. 
Schroter et al.~\cite{schroter2006predicting} observed that bugs and
failures are correlated in code. Neuhaus et al.~\cite{neuhaus2007predicting} 
discovered that in Mozilla's code base
some domains are more risky than others, and being associated with a 
particular domain increases the risk of having a vulnerability. 
%In this work, we apply similar observation to OS kernel design.

\textbf{Virtualization.}
\textit{Language-based virtualization.}
Programming languages that provide safety through virtualization like
Java, JavaScript, Lua~\cite{Lua}, and
Silverlight~\cite{Silverlight} are commonly used in application-level
sandboxing. These languages provide virtualized environments to
check code safety by a monitor process. 
%They combine untrusted application code with an interpreter and
%standard libraries that consolidate routines to perform I/O, network
%communication, and other sensitive functions.
%For example, the Java
%Virtual Machine (JVM) \cite{JVM} functions as an application-level
%sandbox to separate untrusted code from the OS in addition to
%performing safety checks for avoiding unauthorized branching in
%memory.
%
There are also sandboxing solutions based on type-safety of programming
languages, e.g., validating through a type checker \cite{JS-Sandboxing}
or enforcing security policies on an untrusted system through a
reference monitor \cite{JS-Sandboxing1}. %Compared to these approaches,
%the Lind protection model ensures stronger security than just executing the code in a browser's sandbox to isolate untrusted programs.

%Lind enforces strict policies and
%rules to validate all segments of a code before executing it through a
%dual-layer protection.

Though many sandboxes implement the bulk of standard libraries in
memory-safe languages like Java or C\#, flaws in this code can
still pose a threat~\cite{JavaBugs, Java-Lessons}. 
%In fact, many security critical bugs can be found in the standard libraries 
%Researchers have also
%discovered many severe bugs in Java \cite{Java-Lessons}.
Any bug or failure in a programming language virtual
machine is usually fatal. In contrast, Lind with a very small TCB
enhances security of privileged code compared to the above due to fewer LOC which in turn reduces the number of potential bugs or weaknesses. %\lois{why?}
%\lois{OK, what does the TCB have to do with these languages?}

%because of the POSIX implementation with Repy to build the sandbox.

\textit{OS virtualization.}
OS virtualization techniques can be divided into two categories:
%Type-I or Type-II virtualization. Examples of Type-I virtualization
bare-metal hardware virtualization, such as VMware ESX Server, Xen,
LXC~\cite{LXC}, BSD's jail, Solaris zones, and Hyper-V, and
hosted hypervisor virtualization such as VMware
Workstation, VMware Server, VirtualPC and the open-source counterpart
VirtualBox.
%
Security by isolation \cite{Qubes, Overshadow, SecureVM, HypSec} is a feature of OS virtualization that
provides safe executing environments through containment for multiple
user-level virtual environments sharing the same hardware. This
approach relies on the VMM to confine untrusted applications within
guest OSes. However, there are limitations due to
the large attack vectors against the hypervisors, including
vulnerabilities of software and configuration risk. Lind avoids such restrictions due to a smaller TCB.
%For example, the
%small code footprint of a hypervisor can become a potential
%vulnerability and lead to serious problems. Maintaining two OSs in
%virtulized environments is among the other issues that add complexity
%\cite{Virt-Issues}. Attackers are able to exploit vulnerabilities to
%escape these systems and access arbitrary code in the underlying host
%systems.
%\lois{what is the connection between the above sentences and Lind?}
%Lind deals with these concerns by using a much smaller and safer TCB.

\textit{Library OS.}
Library OSes allow applications to efficiently
obtain the benefits of virtual machines,
including security isolation, host platform compatibility, and
migration. 
%Libra \cite {Libra} is an example of a library OS that allows
%applications to customize their OSs within VMs.
%This solution is similar to the exokernel approach where a small trusted kernel
% is separated from the untrusted libraries
%that function as an abstraction layer between the applications and the underlying hardware.
%Libra utilizes a hypervisor instead of the exokernel to deliver efficient
%portability and performance. \lois{Ok..so what is the connection}
%
Drawbridge \cite{Drawbridge-11} is an example of a library OS that uses picoprocesses
 (lightweight containers) security monitor (to enforce rules)
and a library OS to present a Windows persona for a wide variety of
Windows applications. Similar to Lind,
it restricts access from usermode to host OS through a number
of operations that pass through the security monitor.
%\lois{How does this fit with Lind?}

%This approach brings many of the benefits of VM based temporal,
%spatial and fault isolation properties to a per-process level.

Bascule \cite{Bascule}, an architecture for library OS extensions
based on Drawbridge, allows application behavior to be customized by
extensions loaded at runtime.
Graphene \cite{Graphene-14} is a recent library OS system that
seamlessly and efficiently executes both single and
multi-process applications, with low memory and performance overheads.
%It broadens the library OS paradigm to support secure, multi-process
%APIs, such as copy-on-write fork, signals, and System V IPC.
Haven \cite{Haven} uses a library OS to implement
shielded execution of unmodified server applications
in an untrusted cloud host.
%Haven leverages the hardware protection of
%Intel SGX to defend against
%privileged code and physical attacks, such as memory probes, but also
%addresses the dual challenges of
%executing unmodified legacy binaries and protecting them from a
%malicious host.
While the library OS technique relies extensively on
the underlying kernels to perform system functions, 
Lind reimplements complex OS functions through memory-safe Repy 
code, relying on only a limited set of system functions from the Repy 
sandbox kernel. 
Unlike Lind, Drawbridge, Bascule 
or Haven do not have the sandbox environment that properly contains 
buggy or malicious behavior, and therefore could allow attackers 
to trigger zero-day vulnerabilities in the underlying kernel. 

%&\yanyan{please add comparison of Drawbridge/Bascule/Haven to Lind.}

\textbf{System Call Interposition} (SCI) provides
%offers a number of properties that make it attractive to run applications
secure execution of applications by exposing a minimal kernel surface.
This approach usually ensures the userspace policies through a monitor process that
interacts with the system call execution engine
%for building sandboxes, though it can be error prone
\cite{SCI-04}.
%Approaches for delegation and filtering have been extensively studied. For example,
%along with their respective tradeoffs between security and
%performance.
%Janus (J2) \cite{Janus0:96, Janus:99} is an SCI solution for confining untrusted 
%applications that includes system call filtering and sandboxing mechanisms. 
%Garfinkel et al. in another effort, introduces Ostia \cite{SCI-04}, where system 
%calls are delegated to agents that enforce the policies. 
%Ostia resolves the tight dependencies with the underlying OS through process emulation and
%the agents model at the userspace. 
Issues in SCI include the difficulty of appropriately replicating OS semantics,
multi-threaded applications that cause race conditions, and indirect paths to resources that are often overlooked.
In addition, system calls that are denied can cause inconsistencies~\cite{Problems-SCI}.
%
%Recently, Wang et al. \cite{Jitk} discuss the issues of traditional SCI systems
%in the context of in-kernel interpreters such as Seccomp, which is used by several
%security critical applications, including Chrome, OpenSSH and Tor.
%The authors propose Jitk architecture to avoid the weaknesses in traditional
%kernelspace system call interceptors that arise from the unforeseen bugs within
%the interceptors. 
%Jitk provides a formal verification tool that ensure the
%correct translation of the userspace submitted code, e.g. 
%the  the BSD packet filter and INET-DIAG.

%The kernel module enforces policy, denies direct access to restricted
%resources, and delegates certain calls to an emulation library.
%which sends transformed system calls to the agents.
%The agent reads the policy file and handles the delegation of calls.
%
%
%Nevertheless, this technique is very useful and has inspired many new techniques, such as library OSes.
%The concepts behind system call interposition have evolved into other modern techniques and has benefited many security systems, including Lind.
%
The SCI approach is very similar to the Lind isolation mechanism but a key difference is the actual execution
of a system call through the Lind's reimplementation to ensure safe POSIX API for the userspace applications.

\textbf{Software Fault Isolation} (SFI) is an alternative to hardware memory
protection when running two applications in one address space. SFI provides sandboxing in which native
instructions can only be executed if they do not violate the sandbox's
constraints \cite{SFI:93}, and security policies are enforced through machine-level
code analysis. In this approach, memory
writes are protected and code jumps cannot access predefined memory of
other programs to execute their code.

%The preliminary design of SFI was build on RISC
%architectures. The authors of PittSFIeld \cite{PittSFIeld} optimized and
%extended the original SFI to support CISC architectures.
%For this purpose, the source instructions are padded with no-ops to fit, i.e.
%in the 16-byte x86 byte chunk alignment, where a call instruction is
%appended. A sequence of instructions form a instruction streams that
%ensures execution order of the sequence. The final code before
%execution will be checked by a verifier component to ensure safety.
%
%SFI has been also used in MisFIT \cite{MISFit} to ensure kernel
%module integrity for x86. The authors emphasized the
%extendability of object-oriented programming languages to eliminate
%the need for remote calls to achieve high throughput.
%
%Nooks \cite{Nooks:03} is another SFI-based solution that provides
%a protected environment for running device drivers by isolating kernel
%modules. 
%Nooks' runtime environment is located within the
%kernel and it includes the majority of drivers that needs to be protected
%from each other.
%These SFI containers embody the extensions to demonstrate the integrity
%validation check in conjunction with the host environment resources.
%i.e. network nook and video nook are two protection
%domains without the possibility for memory writes outside their
%protection domain. The Nook layer function as a reference monitor
%between device devices and physical hardware by forwarding interrupts
%Nooks also wrap calls from the operating system kernel into device
%drivers and from device drivers into the kernel, allowing the
%operating system to track resource usage and verify data that is
%passed into and out of the kernel. Object tracking to allow OS for
%resource consumption is another feature of wrappers in Nooks.

%To enforce the safety constraints, programs demand a specific compiler
%in advance to validate and load the programs. For example, in
%Another SFI system called Byte
%Granularity Isolation (BGI) \cite{Castro-BGI} was introduced as a runtime tool to isolate drivers in
%separate protection SFI domains, while sharing the same address space
%between different domains.
%%BGI mainly ensures write-integrity,
%%control-flow integrity and type safety for kernel objects. For this
%%purpose, a BGI compiler is required to transform the unmodified driver
%%source code to instrumented driver. The instrumented driver will be
%%linked to the BGI interposition libraries that enforce the protection
%%constraints to produce the BGI drivers. The protection constraints
%%that are enforced by the interposition library are implemented in
%%access control lists (ACL). The ACLs contain policies for each byte of
%%virtual memory and domain permissions.
%A major issue of BGI is that it
%does not solve the vulnerabilities at kernel-levels.

Google provided Native Client (NaCl) \cite{NaCl-09} as an SFI system for
Chrome browser to allow native executable code to be run directly in a
browser. NaCl prevents suspicious code
from memory corruption or direct access to the underlying system
resources. For this purpose NaCl loads untrusted modules from the
trusted modules into two different address spaces, where in most SFI
approaches both untrusted and trusted codes are loaded into a common
address space.

%Mao et al. \cite{LXFI} study the shortcomings of XFI \cite{XFI} and
%BGI \cite{Castro-BGI} kernel module isolation mechanisms that separate
%kernel modules from the core kernel. Integrity of API calls was one of
%the challenging issues in existing solutions. For example in XFI there
%were no argument validation. LXFI solution was proposed as an
%extension of existing SFI solutions with language annotation. LXFI
%added argument integrity in XFI and call back integrity in BGI in
%addition to enabling programmers for specifying principals for access
%permissions within a module.
%
%In another effort \cite{PSFI}, Kroll et al. discuss the issues of
%machine-level SFI that causes portability limitations across different
%hardware architectures. The authors propose portable SFI through a
%higher level of abstraction at the compiler layer. To this end,
%programs are rewritten by an intermediate-level language compiler to
%comply with policies.