Computer scientist, founder, and surfer.

I’m Hannes, the founder and CEO of Segmnts, where we help creators and AI co-workers form dream teams. When I’m not working, you’ll find me surfing or admiring weird trees.

Semantic Clone Detection: A Breakthrough in Software Similarity Analysis

Semantic clone detection identifies program elements with similar runtime behavior, even with 0% syntactic similarity. This article introduces SCD-PSM (Semantic Clone Detection via Probabilistic Software Modeling) as a precise and stable solution. PSM builds a probabilistic model of a program, evaluating and generating runtime data. SCD-PSM detects behaviorally equal elements, generalizing them to semantic equality using likelihood-based distance metrics and a significance test to control false positives. It achieves a Matthews Correlation Coefficient > 0.9, excelling in classic and complex clone detection challenges, including coding competitions.

Probabilistic Software Modeling: A New Paradigm for Software Analysis

Software development involves navigating complex and often unpredictable behaviors. Traditional methods for understanding software structure and behavior rely on static analysis and testing, which may fail to capture hidden relationships and runtime uncertainties. Probabilistic Software Modeling (PSM) introduces a novel paradigm that transforms programs into probabilistic models, enabling enhanced fault detection, semantic code analysis, and predictive insights. This blog explores the core concepts behind PSM and its potential impact on software engineering.

Fault Localization with Probabilistic Software Modeling: A New Approach

Fault localization remains a critical challenge in software development, as traditional techniques struggle with complex, multi-line errors and high false-positive rates. This article introduces Fault Localization via Probabilistic Software Modeling (FL-PSM), an approach that leverages probabilistic models to analyze software behavior dynamically and identify faults more accurately. By comparing runtime data against a probabilistic baseline, FL-PSM improves fault detection, enhances debugging efficiency, and reduces false positives. We explore its methodology, application to a real-world example, and its potential advantages over existing techniques.

How Easily Can We Set Object States for Unit Testing?

Automated test case generation plays a crucial role in software testing, but effectively setting the internal state of objects remains a challenge. This blog explores an empirical study analyzing 110 open-source Java projects to determine how easily object fields can be initialized for testing. The findings reveal that while 66.5% of fields can be directly set to a desired value, 31.5% require further analysis, and 2% remain unmodifiable. Understanding these limitations is essential for improving automated test generation and ensuring realistic test cases.

Work

  1. Company
    Segmnts
    Role
    CEO
    Date
  2. Company
    Epoch ML
    Role
    Lead Machine Learning Engineer
    Date
  3. Company
    Sourceflow Computational Intelligence
    Role
    CEO & Research Consultant for AI/ML
    Date
  4. Company
    Smarter Ecommerce
    Role
    Senior Data Scientist
    Date
  5. Company
    Johannes Kepler University
    Role
    Researcher
    Date
Resume