|
1 |
| -# ace-compiler |
| 1 | +README |
| 2 | +================ |
| 3 | + |
| 4 | +We provide instructions to enable the evaluation of the artifact associated with our CGO'25 Tool Paper, titled "ACE: An FHE Compiler Framework for Automating Neural Network Inference." This paper presents ACE, an open-source FHE compiler that converts ONNX models into equivalent FHE models to perform encrypted inference(https://ace-compiler.github.io/). |
| 5 | + |
| 6 | +Let us rephrase slightly a paragraph from our author response for this CGO'25 tool paper: ACE is the first FHE compiler to automatically compile ONNX models to C/C++ using the CKKS scheme for CPUs. It has been evaluated using a series of six ResNet models, including ResNet110, the most complex model employed in FHE compiler research. Developed as an open-source tool through 44 man-months of collaborative engineering by several experts, ACE is poised to significantly benefit the compiler community in this critical area. |
| 7 | + |
| 8 | +In our evaluation, we compared the ACE compiler with expert hand-tuned implementations using six ResNet models: ResNet-[20|32|44|56|110] on CIFAR-10 and ResNet-32 on CIFAR-100 (referred to as ResNet-32*). The objective of this artifact evaluation is to reproduce our results, presented in Figures 5-7 and Tables 9-10: |
| 9 | +- **Figure 5**: Compile times achieved by ACE |
| 10 | +- **Figure 6**: Comparison of encrypted inference times between ACE and expert implementations |
| 11 | +- **Figure 7**: Comparison of memory usage between ACE and expert implementations |
| 12 | +- **Table 9**: Security parameters selected for the CKKS scheme by ACE |
| 13 | +- **Table 10**: Comparison of accuracy between encrypted inference via the ACE Compiler and unencrypted inference |
| 14 | + |
| 15 | +*Let us begin by noting that performing artifact evaluation for FHE compilation, especially for encrypted inference, is challenging due to the substantial computing resources and significant running times required.* |
| 16 | + |
| 17 | +It is essential to emphasize that FHE remains up to 10,000 times slower than unencrypted computation, even for small machine learning models. To achieve the results presented in Table 10, we tested 1,000 images for each of the six ResNet models. Performing these tests would require approximately 5,000 hours (over 208 days) if conducted sequentially using a single thread on one CPU core. To manage this extensive computational demand efficiently, we conducted encrypted inference tests in parallel using multi-core systems. |
| 18 | + |
| 19 | +To generate Figures 5-7 and Table 9, the process will take **approximately 18 hours**. |
| 20 | +However, reproducing Table 10 presents a significant challenge due to the computational intensity required for artifact evaluation. To facilitate this, we have provided a script that generates the table using only 10 images per model. On a computing platform equipped with 10 cores (details provided below), completing this process is expected to take **approximately 7 hours**. Please note, however, that the results obtained with this abbreviated method should be considered approximate. For those who wish to conduct tests using 1,000 images per model, please be aware that this extended evaluation will take **over 140 hours** on a 64-core platform. |
| 21 | + |
| 22 | +*It is important to note that, like existing FHE compilers, the ACE compiler achieves accuracy in encrypted inference comparable to that of unencrypted inference. Table 10 is included for completeness and does not represent a contribution in this paper. Table 9 simply lists the security parameters used by the ACE compiler. The major results of this paper are presented in Figures 5-7: Figure 5 presents the compile times for the six ResNet models. Figures 6 and 7 compare the ACE compiler to expert hand-tuned implementations, focusing on per-image encrypted inference time and memory usage for each model, respectively.* |
| 23 | + |
| 24 | +*Additionally, the ACE compiler compiles all six ResNet models within seconds. As a result, the Figure 5 you obtain may slightly differ from Figure 5 in our paper. Similarly, your Figure 6 and our Figure 6 may show minor variations. However, the overall trends in both Figures 5 and 6 will remain consistent.* |
| 25 | + |
| 26 | + |
| 27 | +To facilitate artifact evaluation, we provide detailed steps, environment setup, and execution guidelines to ensure that the findings of our research can be independently verified. |
| 28 | + |
| 29 | + |
| 30 | +**Hardware Setup:** |
| 31 | +- Intel Xeon Platinum 8369B CPU @ 2.70 GHz |
| 32 | +- 512 GB memory |
| 33 | + |
| 34 | +**Software Requirements:** |
| 35 | +- Detailed in the [*Dockerfile*](https://github.com/ace-compiler/ace-compiler/blob/main/Dockerfile) for Docker container version 25.0.1 |
| 36 | +- Docker image based on Ubuntu 20.04 |
| 37 | + |
| 38 | +Encrypted inference is both compute-intensive and memory-intensive. A computer with at least **400GB** of memory is required to perform artifact evaluation for our work. |
| 39 | + |
| 40 | + |
| 41 | +## Repository Overview |
| 42 | +- **air-infra:** Contains the base components of the ACE compiler. |
| 43 | +- **fhe-cmplr:** Houses FHE-related components of the ACE compiler. |
| 44 | +- **FHE-MP-CNN:** Directory with EXPERT-implemented source code. |
| 45 | +- **model:** Stores pre-trained ONNX models. |
| 46 | +- **nn-addon:** Includes ONNX-related components for the ACE compiler. |
| 47 | +- **scripts:** Scripts for building and running ACE and EXPERT tests. |
| 48 | +- **README.md:** This README file. |
| 49 | +- **Dockerfile:** File used to build the Docker image. |
| 50 | +- **requirements.txt:** Specifies Python package requirements. |
| 51 | + |
| 52 | +### 1. Preparing a DOCKER environment to Build and Test the ACE Compiler |
| 53 | + |
| 54 | +It is recommended to pull the pre-built docker image (opencc/ace:latest) from Docker Hub: |
| 55 | +``` |
| 56 | +cd [YOUR_DIR_TO_DO_AE] |
| 57 | +mkdir -p ace_ae_result |
| 58 | +docker pull opencc/ace:latest |
| 59 | +docker run -it --name ace -v "$(pwd)"/ace_ae_result:/app/ace_ae_result --privileged opencc/ace:latest bash |
| 60 | +``` |
| 61 | +A local directory `ace_ae_result` is created and mounted in the docker container to collect the generated figures and tables. The container will launch and automatically enters the `/app` directory: |
| 62 | +``` |
| 63 | +root@xxxxxx:/app# |
| 64 | +``` |
| 65 | +Alternatively, if you encounter issues pulling the pre-built image, you can build the image from the [*Dockerfile*](https://github.com/ace-compiler/ace-compiler/blob/main/Dockerfile): |
| 66 | +``` |
| 67 | +cd [YOUR_DIR_TO_DO_AE] |
| 68 | +git clone https://github.com/ace-compiler/ace-compiler.git |
| 69 | +cd ace-compiler |
| 70 | +mkdir -p ace_ae_result |
| 71 | +docker build -t ace:latest . |
| 72 | +docker run -it --name ace -v "$(pwd)"/ace_ae_result:/app/ace_ae_result --privileged ace:latest bash |
| 73 | +``` |
| 74 | + |
| 75 | +### 2. Building the ACE Compiler |
| 76 | + |
| 77 | +To build the ACE compiler, navigate to the `/app` directory within the container and run: |
| 78 | +``` |
| 79 | +/app/scripts/build_cmplr.sh Release |
| 80 | +``` |
| 81 | +Upon successful completion, you will see: |
| 82 | +``` |
| 83 | +Info: build project succeeded. FHE compiler executable can be found in /app/ace_cmplr/bin/fhe_cmplr |
| 84 | +root@xxxxxx:/app# |
| 85 | +``` |
| 86 | +The ACE compiler will be built under `/app/release` and installed in the `/app/ace_cmplr` directory. |
| 87 | + |
| 88 | +### 3. Reproducing Figures 5-7 and Table 9 |
| 89 | + |
| 90 | +For a given machine learning model, an ACE test refers to a test conducted using the FHE equivalent version of the model, generated by the ACE compiler, to perform encrypted inference. An EXPERT test refers to a test conducted using the expert hand-tuned FHE implementation from the paper [*Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions*](https://eprint.iacr.org/2021/1688). Both ACE and EXPERT tests are performed for ResNet-[20|32|44|56|110] on CIFAR-10 and ResNet-32 on CIFAR-100 (referred to as ResNet-32*). |
| 91 | + |
| 92 | +All pre-trained ONNX models utilized by the ACE compiler are located in the [*model*](https://github.com/ace-compiler/ace-compiler/tree/main/model) directory. |
| 93 | + |
| 94 | +*Note: For the hardware environment outlined above, it will take **approximately 5 hours** to complete all the ACE tests and **around 13 hours** to complete all the EXPERT tests (using a single thread).* |
| 95 | + |
| 96 | +#### 3.1 Building EXPERT Hand-Tuned Implementations |
| 97 | + |
| 98 | +In the `/app` directory of the container, run: |
| 99 | +``` |
| 100 | +python3 /app/FHE-MP-CNN/build_cnn.py |
| 101 | +``` |
| 102 | +This will pull code from the EXPERT repository and build the executables. During the build process, it will download external packages needed to build the SEAL library. Upon successful execution of the command, the following message will appear in the prompt: |
| 103 | +``` |
| 104 | +[100%] Built target cnn |
| 105 | +root@xxxxxx:/app# |
| 106 | +``` |
| 107 | +The EXPERT source code will be pulled to `/app/FHE-MP-CNN/FHE-MP-CNN`, and the executables will be built in the `/app/FHE-MP-CNN/FHE-MP-CNN/cnn_ckks/build_cnn` directory. |
| 108 | + |
| 109 | + |
| 110 | +#### 3.2 Running All ACE and EXPERT Tests |
| 111 | + |
| 112 | +In the `/app` directory of the container, run: |
| 113 | +``` |
| 114 | +python3 /app/scripts/perf.py -a |
| 115 | +``` |
| 116 | +Performance data will be printed, and upon completion, you will see: |
| 117 | +``` |
| 118 | +-------- Done -------- |
| 119 | +root@xxxxxx:/app# |
| 120 | +``` |
| 121 | +A log file named with the date and time the command was launched will be generated, such as `2024_05_26_13_18.log`. You can refer to this log for performance data or failure information. For example, if you encounter a **"failed due to SIGKILL"** message, it is likely that you have run out of memory for an EXPERT case. If the process completes successfully, proceed by running: |
| 122 | +``` |
| 123 | +python3 /app/scripts/generate_figures.py -f 2024_05_26_13_18.log |
| 124 | +``` |
| 125 | +The script will generate the results as depicted in the figures and tables of our paper. The outputs are named 'Figure5.pdf', 'Figure6.pdf', 'Figure7.pdf', and 'Table9.pdf'. For the raw data, please refer to the corresponding *.log files. |
| 126 | + |
| 127 | +Here is what you can expect from each file: |
| 128 | + |
| 129 | +- **Figure5.pdf**: |
| 130 | +  |
| 131 | +- **Figure6.pdf**: |
| 132 | +  |
| 133 | +- **Figure7.pdf**: |
| 134 | +  |
| 135 | +- **Table9.pdf**: |
| 136 | +  |
| 137 | + |
| 138 | +*Note: Figures 5-7 and Table 9 shown above use the same data as presented in our paper. However, the appearance of the generated PDF files might vary slightly due to differences in the hardware environments used.* |
| 139 | + |
| 140 | +### 4. Reproducing Table 10 |
| 141 | + |
| 142 | +Table 10 compares the accuracy of encrypted inference for each ResNet model used against the accuracy of unencrypted inference using the same model. |
| 143 | + |
| 144 | +For the data presented in Table 10 of our paper, we tested 1,000 images per model for both encrypted and unencrypted inference. The total time to perform unencrypted inference across all six models is only about one minute when using a single thread. In contrast, encrypted inference would require over 5,000 hours (more than 208 days) using a single thread. |
| 145 | + |
| 146 | +Due to the extensive time required for encrypted inference, parallel execution is necessary. Thus, you are encouraged to conduct this part of the evaluation on a multi-core platform, utilizing as many cores as available to optimize efficiency. |
| 147 | + |
| 148 | + |
| 149 | +#### 4.1 Building the ACE Compiler with OpenMP support |
| 150 | + |
| 151 | +In the `/app` directory of the container, run: |
| 152 | +``` |
| 153 | +/app/scripts/build_cmplr_omp.sh Release |
| 154 | +``` |
| 155 | +Upon successful completion, you will see the following message in the prompt: |
| 156 | +``` |
| 157 | +Info: build project succeeded. FHE compiler executable can be found in release_openmp/driver/fhe_cmplr |
| 158 | +root@xxxxxx:/app# |
| 159 | +``` |
| 160 | +Then, the OpenMP version of the ACE compiler will be built under `/app/release_openmp` and installed in the directory `/app/release_openmp/driver`. |
| 161 | + |
| 162 | +#### 4.2 Performing both Unencrypted and Encrypted Inference Tests |
| 163 | + |
| 164 | +In the `/app` directory of the container, run: |
| 165 | +``` |
| 166 | +python3 /app/scripts/accuracy_all.py -n 10 |
| 167 | +``` |
| 168 | +This process will concurrently conduct encrypted inference tests on the first 10 images (indexed from 0 to 9) for each of the six ResNet models considered in the paper, leveraging all available CPU cores on the system. Similarly, unencrypted inference tests will be performed in parallel. With 10 cores assumed to be available, the expected completion time is **approximately 7 hours**. Upon completion, you will observe the following: |
| 169 | +``` |
| 170 | +Table10-n-ImagesOnly.pdf generated! |
| 171 | +root@xxxxxx:/app# |
| 172 | +``` |
| 173 | + |
| 174 | +We have generated a version of Table 10 by testing only 10 images per model, as shown below: |
| 175 | + |
| 176 | + |
| 177 | +Your version of Table 10 should closely resemble ours. Although these results differ from those reported in Table 10 of the paper, they already demonstrate that the accuracy achieved by encrypted inference under the ACE compiler is comparable to that achieved by unencrypted inference. |
| 178 | + |
| 179 | +To run both unencrypted and encrypted inference tests for the first 1000 images per model, execute the following command in the `/app` directory of the container: |
| 180 | +``` |
| 181 | +python3 /app/scripts/accuracy_all.py -n 1000 |
| 182 | +``` |
| 183 | +This process will take **over 140 hours** to complete on the recommended computing platform, utilizing 64 threads. |
| 184 | + |
| 185 | +The resulting output, `Table10.pdf`, will appear as follows: |
| 186 | + |
| 187 | + |
| 188 | +*Note: The table displayed above is taken directly from our paper. The table you reproduce may look slightly different due to variations in the execution environments used.* |
0 commit comments