Pipeline

VersoVector combines supervised and unsupervised NLP workflows.

For a visual explanation of how the modeling components connect across notebooks, see Model Topology.

The goal is to produce an integrated analytical view of poetic text: predicted tags, semantic neighbors, topics, clusters, and projection metadata.

High-level flow

The cleaning stage prepares the corpus for modeling.

Typical responsibilities:

Output examples:

text

data/poems_processed.csv

The feature pipeline transforms poems into numeric representations.

The project uses classic NLP representations such as:

The feature pipeline is shared by supervised and unsupervised modeling stages.

The supervised branch predicts multilabel tags.

It is used to assign interpretable emotional or thematic labels to poems.

Typical outputs:

The unsupervised branch discovers structure without relying only on labels.

It may generate:

The integration stage combines supervised and unsupervised outputs into a single analytical table.

Typical integrated fields:

The final visualization stage helps interpret the model outputs.

Useful visualizations include: