Embedding Similarity

This component helps Nappai understand how similar two pieces of data are. Imagine you have two lists of words or numbers representing different things; this component figures out how closely related those two lists are. It does this by using different ways of comparing them, allowing you to choose the best method for your data.

Relationship with Nappai’s AI

This component uses Nappai’s underlying AI capabilities to process and compare data vectors. These vectors are numerical representations of your data, created by Nappai’s AI engine. The component then applies mathematical formulas to determine the similarity score.

Inputs

Embedding Vectors: This input requires two sets of data, represented as “embedding vectors.” Think of these as numerical descriptions of your data. Nappai automatically creates these vectors; you don’t need to worry about the technical details. Just make sure you provide exactly two sets of data.
Similarity Metric: This is a dropdown menu where you choose how the component compares the two data sets. You can select from:
- Cosine Similarity: A common method that measures the angle between the data sets. Closer to 1 means more similar.
- Euclidean Distance: Measures the straight-line distance between the data sets. Smaller distance means more similar.
- Manhattan Distance: Measures the distance between the data sets by summing the absolute differences of their coordinates. Smaller distance means more similar. The default is “Cosine Similarity”.

Outputs

The component produces a single output:

Similarity Data: This output shows the results of the comparison. It includes the original two data sets and a single number representing how similar they are. This number will be between 0 and 1 for Cosine Similarity, and a larger number indicates less similarity for Euclidean and Manhattan distances. You can then use this similarity score in other parts of your Nappai workflow.

Usage Example

Let’s say you have two product descriptions. You want to know how similar they are. You would feed the embedding vectors representing these descriptions into the “Embedding Vectors” input. Then, select your preferred similarity metric (e.g., Cosine Similarity). The “Similarity Data” output will give you a score indicating how similar the descriptions are. A score close to 1 (for Cosine Similarity) indicates high similarity.

Templates

[List of templates where the component can be seen and its configuration – To be filled in by the system administrator]

[Links to other related components and a brief description of each – To be filled in by the system administrator]

Tips and Best Practices

Make sure you provide exactly two sets of data in the “Embedding Vectors” input.
Experiment with different similarity metrics to see which one works best for your data.
The meaning of the similarity score depends on the chosen metric. Refer to the metric descriptions above for interpretation.

Security Considerations

[To be filled in by the system administrator if applicable]