CDCC is Recruiting 2025 Undergraduate Summer Intern and REU Students

For more information, please contact [email protected].

Yale University

Accountable Certificate Authority using Trusted Execution Environments and Smart Contracts (PI Fan Zhang). Certificate Authorities (CAs) issue certificates to associate a domain name with its owner’s public key after verifying the ownership. CAs are the root of trust for the Internet, and CA misbehaviors (e.g., issuing bogus certificates) can cause significant disruption and economic loss; however, the current approach can only detect misbehaviors after the fact. As a result, users (or browsers) penalize problematic CA beyond stopping using them.

To make the CA more trustworthy and to enable automatic compensation for affected users, we propose to extend our recent work (CrudiTEE) to improve the accountability of the ACME protocol. The ACME protocol issues certificates to domain owners upon request, provided they demonstrate ownership of the domain through either the DNS challenge or HTTP challenge. To apply CrudiTEE to CA, we need to make the authorization process accountable, which can be achieved by requiring a DNS challenge with DNSSEC. In the DNS challenge, the domain owner must place a specific value in the DNS record under her domain name, signed with her DNS key (a feature provided by DNSSEC). The signature can be used as the proof of authorization.

The summer intern will work with researchers to design and implement this protocol.

Indiana University Bloomington

Confidential Computing for Biomedical Data Protection

The project focuses on developing a big-data analytics framework built on Trusted Executed Environment (TEE), Intel Software Guard Extensions (SGX) in particular, and applying it to support privacy-preserving, large-scale genomic data analyses and other computing tasks. Based upon the understanding of unique performance impacts of SGX systems, including those incurred by enclave creation, management, trust establishment, cross-enclave communication and others, a new MPI-based cluster computing framework is built to automatically optimize the deployment of computing nodes across enclaves and CPU packages under resource constraints. This new framework supports a set of fundamental genomic computing tasks, ranging from reads-mapping to peptide identification, as well as machine-learning based models. Also, its potential risks, side-channel leaks in particular, are analyzed and effectively controlled to provide high privacy assurance. The work will enable broad sharing of previously inaccessible data and help drive the new insights of individualized health care.

The summer intern will work with researchers on this cutting-edge research direction and learn the basic technical and research skills that will help make this new innovation possible.

Secure Automated De-identification of Patient Health Records Using Confidential Computing

The Health Insurance Portability and Accountability Act (HIPAA) mandates strict guidelines to protect the privacy and security of sensitive patient health information (PII), requiring the removal of personally identifiable information (PII) from medical records before they can be used for research. This REU project addresses the need for HIPAA-compliant de-identification by developing an automated system that leverages a lightweight Large Language Model (LLM) to identify and mask PII in clinical documents while preserving the data’s research utility. The de-identification process will be securely deployed within a confidential computing environment using AMD’s Secure Encrypted Virtualization (SEV) technology. This approach ensures maximum data privacy and security, offering students hands-on experience at the intersection of natural language processing, healthcare data privacy, and secure computing, and tackling critical challenges in protecting patient information in research and clinical applications.

Privacy-Preserving Use of Vectorized Patient Health Records in LLM/RAG Applications via Confidential Computing

This REU project will explore potential privacy leaks in the vectorized representation of patient health records and investigate secure methods to mitigate these risks within confidential computing environments. With the rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques, healthcare researchers are increasingly utilizing vectorized patient data to enhance information retrieval and personalized medicine applications. While these technologies offer tremendous potential to advance healthcare research, they also introduce privacy concerns, as vector embeddings can inadvertently reveal sensitive information. This project will involve assessing privacy vulnerabilities in health data embeddings and developing confidential computing solutions, such as AMD’s Secure Encrypted Virtualization (SEV), to secure LLM/RAG applications using vector databases. Participants will gain hands-on experience in applying privacy-preserving strategies in AI-driven healthcare research, contributing to a critical area of health informatics and data security.

Carnegie Mellon University

Human Factors in Distributed Confidential Computing

Mentor: Lorrie Cranor

Description and Significance
The goal of Distributed Confidential Computing (DCC) is to enable scalable data-in-use protections for cloud and edge systems, like home IoT. The “protections” offered by DCC depends on the users and what privacy protections they require. The goal of this project is to (1) determine what kind of protections key stakeholders would want, and (2) to design an interface for describing these protections.

Student Involvement
Students will learn how to conduct research in usable privacy and security by working with a mentor to conduct a user study that will identify privacy preferences DCC stakeholders. Based on this data, the team will design and implement prototype interfaces, which will be evaluated by another user study. Depending on interests and project needs, the student may help set up online surveys and collect data on a crowd worker platform, perform qualitative and/or quantitative data analysis, or design and implement prototypes.

References
Center for Distributed Confidential Computing. https://nsf-cdcc.org/

Hana Habib and Lorrie Faith Cranor. Evaluating the Usability of Privacy Choice Mechanisms. SOUPS ‘22. https://www.usenix.org/system/files/soups2022-habib.pdf

Investigating Multi-factor Authentication Phishing

Mentor: Lorrie Cranor

Description and Significance
Two-factor authentication (2FA) or, more generally, multi-factor authentication (MFA) is a common approach used by large organizations to protect accounts from compromise. While introducing MFA makes traditional phishing attacks more difficult, attackers have adopted new ruses to trick users into authenticating. For example, an attacker may contact the victim in order to induce them to approve a login prompt or provide an authentication code ¹. Despite the rise in social engineering attacks on MFA, there is limited research ² exploring why users are susceptible to MFA attacks or what additional safety measures would strengthen resilience to MFA social engineering attacks. To rectify this gap, we plan to conduct a multi-stage study investigating users’ interactions with MFA phishing in the university setting.

Student Involvement
Students will work with a graduate student to help conduct a user study related to MFA phishing. Through this process, they will learn about how HCI research methods (e.g., interviews, surveys, etc.) are applied to computer security and privacy issues. Based on student interest and the results of ongoing research, students may be involved in all stages of the research process, including design, execution, and analysis. Students may also help to build web infrastructure for simulating MFA phishing attacks.

References
¹ Siadati, Hossein, et al. “Mind your SMSes: Mitigating social engineering in second factor authentication.” Computers & Security 65 (2017): 14-28.

² Burda, Pavlo, Luca Allodi, and Nicola Zannone. “Cognition in social engineering empirical research: a systematic literature review.” ACM Transactions on Computer-Human Interaction 31.2 (2024): 1-55

Safety Labels for GenAI Applications

Mentor: Lorrie Cranor

Description and Significance
The rapid proliferation of Generative AI (GenAI) in consumer applications has sparked calls for transparent methods to assess and communicate the safety risks of these technologies to consumers, users and the general public. This includes concerns about bias, toxicity, misinformation, security and privacy, and beyond. The scope of these issues extends from foundation models (e.g., GPT-4) to their diverse applications (e.g., ChatGPT). Yet existing evaluation methods heavily focus on benchmarking foundation models, overlooking the complicated interactions between users and GenAI-powered applications in the context of use. Academia, industry, and policymakers are all advocating for safety labels to inform users and consumers about the potential risks and harms of GenAI applications. Despite these initiatives, challenges remain in how to effectively measure safety risks considering the human-AI interaction layer, how to design labels that are both useful and usable for users and those impacted, and how to responsibly deploy safety labels in practice through real-world partnerships. The goal of this project is to develop a dynamic, expert-informed, user-centered approach to evaluate GenAI-powered applications and create public-facing safety labels.

Student Involvement
Students will learn how to conduct research in usable privacy and security by working with a mentor to design and/or conduct a study that will inform the design of GenAI safety labels. Depending on interests and project needs, the student may help set up online surveys and collect data on a crowd worker platform, help design and conduct interviews or focus groups, perform qualitative and/or quantitative data analysis, or design and implement prototypes.

References
The CUPS Lab has done prior label development work in other areas, including privacy nutrition labels (https://cups.cs.cmu.edu/privacyLabel/) and IoT security and privacy labels (https://iotsecurityprivacy.org/).

Yale University

Indiana University Bloomington

Confidential Computing for Biomedical Data Protection

Secure Automated De-identification of Patient Health Records Using Confidential Computing

Privacy-Preserving Use of Vectorized Patient Health Records in LLM/RAG Applications via Confidential Computing

Carnegie Mellon University

Human Factors in Distributed Confidential Computing

Investigating Multi-factor Authentication Phishing

Safety Labels for GenAI Applications

Recent Posts