Project Semiotica




proprietary software using advanced, multilayered algorithms to detect possible disinformation campaigns on x.com

categorisation and cluster abalysis as well as individual post-User profiling

Detection of targeted, orchestrated information campaigns 0nline

Dynamic, self-learning, layered pipeline logic

Pipeline Self-calibration based on human assesment of data output


How it works

Pipeline Levels:

Level 0: Preprocessing

Language detection, text normalization, tokenization, lemmatization

Level 1: Keyword Filtering

Exact, fuzzy, and semantic keyword triggers with early exit capability

Level 2: Semantic Similarity

TF-IDF vectorization, cosine similarity, DBSCAN clustering, narrative mapping

Level 3: Stylometry

POS tagging, sentence length distribution, lexical diversity, readability scores

Level 4: Emotional Analysis

Sentiment analysis, rhetorical markers, discourse analysis, modality detection

Level 5: Pattern Analysis

Topic modeling, burst detection, temporal patterns, Bayesian changepoint detection

Level 6: Metadata

Engagement-weighted sentiment, account history analysis, temporal burst graphs

Dynamic Weighted Scoring System

W_i = Weight assigned to each signal (0-1 scale, sum of weights = 1).

S_i = Normalised score (0-1) for each linguistic/statistical feature.

Final score → probabilistic confidence:

>0.7 = high risk

0.4–0.7 = medium

<0.4 = low

How it works

Most content on X turn out to be benign, and is part of healthy debate and regular user activity.

Other content have origins in fake accounts on X (bought or overtaken), is computer generated, or contain information which repeat the same or similar messages over and over again, across accounts.

Our system is able to detect clusters of orchestrated, systematic disinformation using statistical tools.