Semantic Clone Detection: A Breakthrough in Software Similarity Analysis
Semantic clone detection identifies program elements with similar runtime behavior, even with 0% syntactic similarity. This article introduces SCD-PSM (Semantic Clone Detection via Probabilistic Software Modeling) as a precise and stable solution. PSM builds a probabilistic model of a program, evaluating and generating runtime data. SCD-PSM detects behaviorally equal elements, generalizing them to semantic equality using likelihood-based distance metrics and a significance test to control false positives. It achieves a Matthews Correlation Coefficient > 0.9, excelling in classic and complex clone detection challenges, including coding competitions.