Projects

Projects, Articles, and Theses

Projects

Projects, Articles, and Theses

Selected Projects

Testing Platform for Game Development Studios

The two years I spent at Epoch were divided into two phases, tailored to the startup's needs. The first year was dedicated to researching, designing, and implementing automated Game QA solutions. These ranged from simple capture-and-replay (performance-sensitive capture and replay of human-computer interactions) to model-based solutions (imitation learning on 3D videos of user input actions). The resulting product features included capture-and-replay testing, which enabled unit tests for gameplay; localization testing, which allowed for unit tests for text scripts; and menu testing, which facilitated classical UX tests for games. I fully owned and developed many of these features from their inception through to their deployment. The second year saw a necessary shift in the startup's resources and a transition from research to product. During this period, I supported the team with full-stack development and helped bring the product to market. This involved the design, implementation, and maintenance of user-facing product features, such as video capture and streaming for users, collaborative team features that enabled users to react and interact with gameplay videos, and the implementation of a custom dashboard infrastructure to meet various customer needs. Additionally, I was responsible for most of the internal infrastructure and cloud requirements.
Type Commercial Project
Timeline 2022 - Feb 2024
Duration 2 years
Responsibilities Senior Machine Learning Researcher - responsible for the initial research and ideation of automated game testing solutions, such as capture replay engines, localization testing, agent-based testing.
Senior Machine Learning Engineer - responsible for design, development, and deployment of the machine learning platform and structuring of the data flows.
responsible for designing, and implementing platform essential features for mobile, web, and desktop.

Bid Management System for Google Shopping

A bid management system is often used in the context of online marketing where multiple advertisers compete for a limited advertisement space on the internet. The project solved a multi-faceted regression problem intersecting feature engineering, regression, time-series analysis, incrementality testing, and large-scale daily ML Ops of the system. A particular challenge is given in the broad spectrum of data encountered in the commercial system that actively influences the data it will consume in the future. Another challenge is given in the daily operation and quality assurance of the produced predictions. Commercial systems need multiple quality gates that slowly increase the impact radius of its effect while continuously monitoring machine learning and business performance.
Type Commercial Project
Timeline 2020 - 2022
Duration 2 years
Responsibilities Product Data Scientist - responsible for the strategic innovative and technical soundness of the system from a data science perspective.
Data Science Software Architect - responsible for the architectural evolution of the ML pipeline that is integrated into a larger set of systems.

Probabilistic Software Modeling

The projects devised a new modeling paradigm that transforms a program into a probabilistic model. The resulting model can be used for analytical and generative applications in software engineering. For example, it can detect semantic clones (e.g., iterative and recursive implementation of factorial) within programs. Another example would be the localization of faults based on semantic/behavioral divergences. Gradient, an implementation of Probabilistic Software Modeling, uses classical static code analysis, high-performance distributed computing for the runtime monitoring, and state-of-art neural density estimation for building the probabilistic model.
Type Research Project
Timeline 2016 - 2021
Duration 5 years
Responsibilities Principal Researcher - developing the theoretical concepts defining its strategic roadmap
Designer and Developer - designing and architecting the theoretical concepts into workable practical components

Design Pattern Detection via Feature Maps

The project created an innovative representation of the programs called feature maps. Features maps are images of the structural properties of a program and can be used to detect design patterns. The system uses static code analysis and a wide range of graph transformations to extract crucial program properties. These properties are then projected into matrices that represent an image of the program.
Type Research Project
Timeline 2015 - 2016
Duration 1 year
Responsibilities Principal Researcher - developing the theoretical concepts defining its strategic roadmap
Designer and Developer - designing and architecting the theoretical concepts into workable practical components

Educator for Computer Science

Throughout my years at the Computer Science Faculty of Johannes Kepler University, which ranks in the top 3% worldwide, I have educated more than 500 students both indirectly (as a student teaching assistant) and directly (as a lecturer) in nearly all aspects of Computer Science and at various levels of study. These years shaped a highly cooperative and instructive approach to leading engineering teams in my later professional career, fostering productive teams with high cohesion and velocity.
Type Educational Projects
Timeline 2013 - 2020
Duration 7 years
Responsibilities Student Teaching Assistant - supporting students with material questions and grading of exercises
Lecturer - leading and designing exercise tracks which are mandatory practical tracks parallel to the theoretical tracks
Co-Supervision - supervising students and supporting them in designing and executing their research efforts for their thesis

Papers

Semantic Clone Detection via Probabilistic Software Modeling

Presents and evaluates semantic clone detection using probabilistic software modeling.
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In Johnsen, E.B., Wimmer, M. (eds) Fundamental Approaches to Software Engineering. FASE 2022. Lecture Notes in Computer Science, vol 13241. Springer, Cham.
DOI 10.1007/978-3-030-99429-7_16
ISBN 978-3-030-99429-7
Media

Towards Semantic Clone Detection via Probabilistic Software Modeling

Outlines the use of Probabilistic Software Modeling to detect semantically equivalent code elements.
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In 2020 IEEE 14th International Workshop on Software Clones (IWSC), London, ON, Canada, 2020, pp. 64--69
DOI 10.1109/IWSC50091.2020.9047635
Media

Towards Fault Localization via Probabilistic Software Modeling

Outlines the use of Probabilistic Software Modeling to localize faults and their impact across program elements.
Authors H. Thaller, L. Linsbauer, A. Egyed, and S. Fischer
Published In 2020 IEEE 3rd International Workshop on Validation, Analysis, and Evolution of Software Tests (VST), London, ON, Canada, 2020, pp. 24--27
DOI 10.1109/VST50071.2020.9051635
Media

An Empirical Evaluation for Object Initialization of Member Variables in Unit Testing

Explores the viability of unintrusive object initialization for test case generation.
Authors S. Fischer, E.N. Hasling, M. Zimmermann, and H. Thaller
Published In 2020 IEEE 3rd International Workshop on Validation, Analysis, and Evolution of Software Tests (VST), London, ON, Canada, 2020, pp. 8--11
DOI 10.1109/VST50071.2020.9051634
Media

Feature Maps: A Comprehensible Software Representation for Design Pattern Detection

Feature Maps are a human-readable representation of software that are useful, e.g., to detect design patterns via supervised machine learning (CNNs and RFs).
Authors H. Thaller, L. Linsbauer, and A. Egyed
Published In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), Hangzhou, China, 2019, pp. 207-217
DOI 10.1109/SANER.2019.8667978
Media

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Insights and experiences of five cases using graph databases in static code analysis settings.
Authors R. Ramler, G. Buchgeher, C. Klammer, M. Pfeiffer, C. Salomon, H. Thaller, and L. Linsbauer
Published In Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud, vol. 338, D. Winkler, S. Biffl, and J. Bergsmann, Eds. Cham: Springer International Publishing, 2019, pp. 125–148.
DOI 10.1007/978-3-030-05767-1_9
Media

Probabilistic Software Modeling

A modeling approach that analyzes structure and behavior of applications and reconstructs it using a network of generative probabilistic models.
Authors H. Thaller
Published In 2018 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Amsterdam, 2018, ECOOP and ISSTA Doc Symposium
Media

Exploring Code Clones in Programmable Logic Controller Software

Code clones exist in PLC software, and the development can benefit from better tooling.
Authors H. Thaller, R. Ramler, J. Pichler, and A. Egyed
Published In 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Limassol, 2017, pp. 1-8.
DOI 10.1109/ETFA.2017.8247574
Media

Subliminal Visual Information to Enhance Driver Awareness and Induce Behavior Change

Subliminal visual information has enormous potential to reduce the cognitive load of drivers, but it is too weak to stress critical behavior change.
Authors A. Riener and H. Thaller
Published In AutomotiveUI ‘14 Proceedings of the 6th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Pages 1-9
DOI 10.1145/2667317.2667328
Media

Theses

Probabilistic Software Modeling

Probabilistic Software Modeling transforms a program into a probabilistic model that enables analytical, generative, and inferential methods for program comprehension.
Authors H. Thaller
Supervisor A. Egyed and F. Khomh
Published In 2021
Media

Towards Deep Learning Driven Design Pattern Detection

Convolutional Neural Networks can detect design patterns in a volumetric abstraction of the source code in even the most imbalanced settings.
Authors H. Thaller
Supervisor A. Egyed
Published In 2016
Media

Driver Performance Manipulation via Visual Subliminal Cues

Subliminal cues are unintrusive, but a weak mechanism to feed information to drivers, e.g., to mitigate rear-end accidents.
Authors H. Thaller
Supervisor A. Riener
Published In 2014
Media

Supervisions

Cluster Analysis for Multivariate Application Performance Management Issues

Clustering can control alarm floods in application performance management systems improving reporting and analysis of large scale systems.
Authors V. Precup
Supervisor A. Egyed and H. Thaller
Published In 2018
Media