Exploring Computer Science, AI, Woods, and Waves

A collection of my long-form thoughts on computer science, entrepreneurship, digital nomading, woods, and waves—organized in chronological order.

Semantic Clone Detection: A Breakthrough in Software Similarity Analysis

Semantic clone detection identifies program elements with similar runtime behavior, even with 0% syntactic similarity. This article introduces SCD-PSM (Semantic Clone Detection via Probabilistic Software Modeling) as a precise and stable solution. PSM builds a probabilistic model of a program, evaluating and generating runtime data. SCD-PSM detects behaviorally equal elements, generalizing them to semantic equality using likelihood-based distance metrics and a significance test to control false positives. It achieves a Matthews Correlation Coefficient > 0.9, excelling in classic and complex clone detection challenges, including coding competitions.

Probabilistic Software Modeling: A New Paradigm for Software Analysis

Software development involves navigating complex and often unpredictable behaviors. Traditional methods for understanding software structure and behavior rely on static analysis and testing, which may fail to capture hidden relationships and runtime uncertainties. Probabilistic Software Modeling (PSM) introduces a novel paradigm that transforms programs into probabilistic models, enabling enhanced fault detection, semantic code analysis, and predictive insights. This blog explores the core concepts behind PSM and its potential impact on software engineering.

Fault Localization with Probabilistic Software Modeling: A New Approach

Fault localization remains a critical challenge in software development, as traditional techniques struggle with complex, multi-line errors and high false-positive rates. This article introduces Fault Localization via Probabilistic Software Modeling (FL-PSM), an approach that leverages probabilistic models to analyze software behavior dynamically and identify faults more accurately. By comparing runtime data against a probabilistic baseline, FL-PSM improves fault detection, enhances debugging efficiency, and reduces false positives. We explore its methodology, application to a real-world example, and its potential advantages over existing techniques.

How Easily Can We Set Object States for Unit Testing?

Automated test case generation plays a crucial role in software testing, but effectively setting the internal state of objects remains a challenge. This blog explores an empirical study analyzing 110 open-source Java projects to determine how easily object fields can be initialized for testing. The findings reveal that while 66.5% of fields can be directly set to a desired value, 31.5% require further analysis, and 2% remain unmodifiable. Understanding these limitations is essential for improving automated test generation and ensuring realistic test cases.

Feature Maps: A Smarter Approach to Design Pattern Detection

Design patterns are fundamental in software engineering, but identifying them in source code can be challenging due to variations in implementation. Feature Maps provide a structured, human- and machine-comprehensible representation of software, enabling more effective design pattern detection. This blog explores how Feature Maps, combined with machine learning techniques like Random Forests and Convolutional Neural Networks (CNNs), improve the accuracy and interpretability of pattern detection, even in highly imbalanced data scenarios.

Graph Databases for Source Code and Software Engineering Analysis

As software systems grow in complexity, understanding their structure, dependencies, and evolution becomes a significant challenge. Traditional relational databases often struggle to efficiently store and query highly interconnected data such as source code elements, dependencies, and software artifacts. This blog explores how graph databases provide a powerful alternative, enabling scalable, flexible, and efficient analysis of software systems. By representing software components as nodes and relationships, graph databases facilitate advanced static analysis, dependency tracking, and system evolution studies, improving both maintainability and software quality.

Using Cluster Analysis to Improve Application Performance Management

As software systems grow in complexity, their monitoring becomes increasingly challenging, often resulting in overwhelming amounts of alerts and notifications. This blog explores the use of cluster analysis to systematically group recurring monitoring issues, reducing noise and improving the efficiency of Application Performance Management (APM). By applying unsupervised learning techniques and optimizing clustering algorithms, the approach enables more insightful problem reporting, making it easier for system administrators to detect and prioritize critical issues.

Unraveling Code Clones in Programmable Logic Controller (PLC) Software

Code cloning is a common practice in software development, particularly in industrial automation where Programmable Logic Controller (PLC) software is developed using IEC 61131-3 Structured Text (ST) and C/C++. While cloning facilitates rapid development, it also introduces maintenance challenges. This study explores the nature of code clones in PLC software, extending an existing detection tool to support ST. The findings highlight the prevalence of clones, differences between C/C++ and ST cloning patterns, and the necessity of specialized clone management strategies. By adopting automated tools and refactoring approaches, software teams can improve maintainability and reduce technical debt.

Driving Performance Manipulation via Visual Subliminal Cues

Driving safety and performance are critical concerns in modern transportation. With increasing in-vehicle distractions and cognitive loads, researchers are exploring alternative guidance techniques that do not interfere with conscious perception. This blog delves into the concept of visual subliminal cues—stimuli presented below the threshold of conscious awareness—to subtly influence driver behavior. Through a controlled driving simulation experiment, this research assesses whether such cues can enhance reaction times and lane-changing efficiency. The findings suggest that while subliminal priming can be successfully applied, its effectiveness in improving driving performance remains inconclusive.