Z. Berkay Celik344 IST Building
University Park, PA 16802
I am a research assistant in Department of Electrical Engineering and Computer Science at the Pennsylvania State University working with Prof. Patrick McDaniel and a member of the Systems and Internet Infrastructure Security Laboratory (SIIS).
I’ve had the opportunity to work on a number of interesting research projects during my Msc. and PhD. studies. Here is a summary of some of my efforts.
Sensitive Information Tracking in Commodity IoT
We present SainT, a static taint analysis tool for IoT applications. SainT operates in three phases; (a) translation of platform-specific IoT source code into an intermediate representation (IR), (b) identifying sensitive sources and sinks, and (c) performing static analysis to identify sensitive data flows. We evaluate SainT on 230 SmartThings market apps and find 138 (60%) include sensitive data flows. In addition, we demonstrate SainT on IoTBench, a novel open-source test suite containing 19 apps with 27 unique data leaks. Through this effort, we introduce a rigorously grounded framework for evaluating the use of sensitive information in IoT apps—and therein provide developers, markets, and consumers a means of identifying potential threats to security and privacy. Read more about this work.
Detection under Privileged Information
For over a quarter century, security-relevant detection has been driven by models learned from input features collected from real or simulated environments. An artifact (e.g., network event, potential malware sample, suspicious email) is deemed malicious or non-malicious based on its similarity to the learned model at run-time. However, the training of the models has been historically limited to only those features available at run time. This talk covers an alternate model construction approach that trains models using forensic “privileged” information–features available at training time but not at runtime–to improve the accuracy and resilience of detection systems. Such techniques open the door to systems that can integrate forensic data directly into detection models, and therein provide a means to fully exploit the information available about past security-relevant events.
Our paper is accepted to AsiaCCS’18: Detection under Privileged Information. Read more about formulation and implementation in our technical report, and feature cultivation in privileged-augmented detection Feature Cultivation in Privileged Information-augmented Detection (invited paper).
Machine Learning in Adversarial Settings
One of the limitations of machine learning in practice is that they are subject to adversarial samples. Adversarial samples are carefully modified inputs crafted to dictate a selected output. In the context of classification, adversarial samples are crafted so as to force a target model to classify them in a class different from their legitimate class. In this work, we focus on Deep Neural Networks (DNNs) for adversarial sample generation and attacker’s capabilities to evade systems built on DNNs. Check out our publication on adversarial machine learning: The Limitations of Deep Learning in Adversarial Settings and Practical Black-Box Attacks against Machine Learning.
Patient-Driven Privacy Control
Patients are asked to disclose personal information such as genetic markers, lifestyle habits, and clinical history. This data is then used by statistical models to predict personalized treatments. However, due to privacy concerns, patients often desire to withhold sensitive information. This self-censorship can impede proper diagnosis and treatment, which may lead to serious health complications and even death. We present privacy distillation, a mechanism which allows patients to control the type and amount of information they wish to disclose to the healthcare providers for use in statistical models.
This paper is accepted to IEEE Privacy-aware computing (PAC) conference Patient-Driven Privacy Control through Generalized Distillation, 2016. The use of patient data made us develop new algorithms for Achieving Secure and Differentially Private Computations in Multiparty Settings. This paper is also accepted to PAC conference.
Malware Traffic Detection and Experimentation Artifacts
We present a framework for evaluating the transport layer feature space of malware heartbeat traffic. We use these features in a prototype detection system to distinguish malware traffic from traffic generated by legitimate applications. Further, we characterize the evolution of malware evasion techniques over time by examining the behavior of 16 malware families. In particular, we highlight the difficulty of detecting malware that use traffic-shaping techniques to mimic legitimate traffic. Read more about the study here.
In our CSET 2011 work my co-authors and I also take a closer look at the experimentation artifacts of malware detection. We find that current approaches do not consider timing-based calibration of the C&C traffic traces prior to using this traffic to salt a background traffic trace. Thus, timing-based features of the C&C traffic may be artificially distinctive, potentially leading to (unrealistically) optimistic flow classification results.
Science of Security
I am involved in Cyber-Security Collaborative Research Alliance (CSec CRA) with the Army Research Laboratory, Penn State, Carnegie Mellon, UC Riverside, UC Davis, and Indiana University. Our mandate is to develop a new science of security. As part of this effort, I’ve worked on the foundation for representing of operational and environmental knowledge. See our work on operational models. Our goal is to reason about both current and future states of a cyber-operation to make optimal decisions.