Projects

Projects, Articles, and Theses

Projects

Projects, Articles, and Theses

Projects

Testing Platform for Game Development

A testing service that allow Unit-Test like tests for game studios and games in video space, i.e., while the game is controlled and naturally played by testers.
Type Commercial Project
Timeline 2022 - current
Duration running
Responsibilities Machine Learning Engineer - responsible for the initial design and development of the machine learning platform and structuring of the data flows.
Machine Learning Researcher - responsible for the initial problems specification and solution design as further ideation for downstream products in the context of Game Dev QA.

Bid Management System for Google Shopping

A bid management system is often used in the context of online marketing where multiple advertisers compete for a limited advertisement space on the internet. The project solved a multi-faceted regression problem intersecting feature engineering, regression, time-series analysis, incrementality testing, and large-scale daily ML Ops of the system. A particular challenge is given in the broad spectrum of data encountered in the commercial system that actively influences the data it will consume in the future. Another challenge is given in the daily operation and quality assurance of the produced predictions. Commercial systems need multiple quality gates that slowly increase the impact radius of its effect while continuously monitoring machine learning and business performance.
Type Commercial Project
Timeline 2020 - 2022
Duration 2 years
Responsibilities Product Data Scientist - responsible for the strategic innovative and technical soundness of the system from a data science perspective.
Data Science Software Architect - responsible for the architectural evolution of the ML pipeline that is integrated into a larger set of systems.

Probabilistic Software Modeling

The projects devised a new modeling paradigm that transforms a program into a probabilistic model. The resulting model can be used for analytical and generative applications in software engineering. For example, it can detect semantic clones (e.g., iterative and recursive implementation of factorial) within programs. Another example would be the localization of faults based on semantic/behavioral divergences. Gradient, an implementation of Probabilistic Software Modeling, uses classical static code analysis, high-performance distributed computing for the runtime monitoring, and state-of-art neural density estimation for building the probabilistic model.
Type Research Project
Timeline 2016 - 2021
Duration 5 years
Responsibilities Principal Researcher - developing the theoretical concepts defining its strategic roadmap
Designer and Developer - designing and architecting the theoretical concepts into workable practical components

Design Pattern Detection via Feature Maps

The project created an innovative representation of the programs called feature maps. Features maps are images of the structural properties of a program and can be used to detect design patterns. The system uses static code analysis and a wide range of graph transformations to extract crucial program properties. These properties are then projected into matrices that represent an image of the program.
Type Research Project
Timeline 2015 - 2016
Duration 1 year
Responsibilities Principal Researcher - developing the theoretical concepts defining its strategic roadmap
Designer and Developer - designing and architecting the theoretical concepts into workable practical components

Papers

Semantic Clone Detection via Probabilistic Software Modeling

Presents and evaluates semantic clone detection using probabilistic software modeling.
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In Johnsen, E.B., Wimmer, M. (eds) Fundamental Approaches to Software Engineering. FASE 2022. Lecture Notes in Computer Science, vol 13241. Springer, Cham.
DOI 10.1007/978-3-030-99429-7_16
ISBN 978-3-030-99429-7
Media

Towards Semantic Clone Detection via Probabilistic Software Modeling

Outlines the use of Probabilistic Software Modeling to detect semantically equivalent code elements.
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In 2020 IEEE 14th International Workshop on Software Clones (IWSC), London, ON, Canada, 2020, pp. 64--69
DOI 10.1109/IWSC50091.2020.9047635
Media

Towards Fault Localization via Probabilistic Software Modeling

Outlines the use of Probabilistic Software Modeling to localize faults and their impact across program elements.
Authors H. Thaller, L. Linsbauer, A. Egyed, and S. Fischer
Published In 2020 IEEE 3rd International Workshop on Validation, Analysis, and Evolution of Software Tests (VST), London, ON, Canada, 2020, pp. 24--27
DOI 10.1109/VST50071.2020.9051635
Media

An Empirical Evaluation for Object Initialization of Member Variables in Unit Testing

Explores the viability of unintrusive object initialization for test case generation.
Authors S. Fischer, E.N. Hasling, M. Zimmermann, and H. Thaller
Published In 2020 IEEE 3rd International Workshop on Validation, Analysis, and Evolution of Software Tests (VST), London, ON, Canada, 2020, pp. 8--11
DOI 10.1109/VST50071.2020.9051634
Media

Feature Maps: A Comprehensible Software Representation for Design Pattern Detection

Feature Maps are a human-readable representation of software that are useful, e.g., to detect design patterns via supervised machine learning (CNNs and RFs).
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), Hangzhou, China, 2019, pp. 207-217
DOI 10.1109/SANER.2019.8667978
Media

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Insights and experiences of five cases using graph databases in static code analysis settings.
Authors R. Ramler, G. Buchgeher, C. Klammer, M. Pfeiffer, C. Salomon, H. Thaller, and L. Linsbauer
Published In Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud, vol. 338, D. Winkler, S. Biffl, and J. Bergsmann, Eds. Cham: Springer International Publishing, 2019, pp. 125–148.
DOI 10.1007/978-3-030-05767-1_9
Media

Probabilistic Software Modeling

A modeling approach that analyzes structure and behavior of applications and reconstructs it using a network of generative probabilistic models.
Authors H. Thaller
Published In 2018 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Amsterdam, 2018, ECOOP and ISSTA Doc Symposium
Media

Exploring Code Clones in Programmable Logic Controller Software

Code clones exist in PLC software, and the development can benefit from better tooling.
Authors H. Thaller, R. Ramler, J. Pichler, and A. Egyed
Published In 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Limassol, 2017, pp. 1-8.
DOI 10.1109/ETFA.2017.8247574
Media

Subliminal Visual Information to Enhance Driver Awareness and Induce Behavior Change

Subliminal visual information has enormous potential to reduce the cognitive load of drivers, but it is too weak to stress critical behavior change.
Authors A. Riener and H. Thaller
Published In AutomotiveUI ‘14 Proceedings of the 6th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Pages 1-9
DOI 10.1145/2667317.2667328
Media

Theses

Probabilistic Software Modeling

Probabilistic Software Modeling transforms a program into a probabilistic model that enables analytical, generative, and inferential methods for program comprehension.
Authors H. Thaller
Supervisor A. Egyed and F. Khomh
Published In 2021
Media

Towards Deep Learning Driven Design Pattern Detection

Convolutional Neural Networks can detect design patterns in a volumetric abstraction of the source code in even the most imbalanced settings.
Authors H. Thaller
Supervisor A. Egyed
Published In 2016
Media

Driver Performance Manipulation via Visual Subliminal Cues

Subliminal cues are unintrusive, but a weak mechanism to feed information to drivers, e.g., to mitigate rear-end accidents.
Authors H. Thaller
Supervisor A. Riener
Published In 2014
Media

Supervisions

Cluster Analysis for Multivariate Application Performance Management Issues

Clustering can control alarm floods in application performance management systems improving reporting and analysis of large scale systems.
Authors V. Precup
Supervisor A. Egyed and H. Thaller
Published In 2018
Media