Advertisement
Python makes it pretty easy to calculate cosine similarity, and once you know how it works, you’ll find yourself using it all the time. Whether you're working with text data, user behavior, or item recommendations, cosine similarity offers a simple way to measure how alike two sets of data are. Let’s take a closer look at how it’s done.
Cosine similarity compares the angle between two non-zero vectors in a high-dimensional space. Rather than paying attention to the magnitude (size) of the vectors, it contrasts their direction. This comes in particularly handy when you wish to verify how similar two documents or data points are without being sidetracked by size discrepancies.
Imagine it this way: Two essays could be of different lengths, but if they're discussing the same thing with the same words, then cosine similarity will catch that. If they are exactly the same, the score is 1. If they are totally different, the score is 0. Anything else in between indicates how close or far they are.
Now that you have gotten the basic idea let's jump into the different ways you can calculate cosine similarity using Python. Whether you like keeping things simple with built-in libraries or you prefer writing a bit more code manually, there's something for everyone.
One of the quickest ways to calculate cosine similarity is by using the scikit-learn library. Here’s how you can do it:
python
CopyEdit
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# Example vectors
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
# Reshape for sklearn (it expects 2D arrays)
vector1 = vector1.reshape(1, -1)
vector2 = vector2.reshape(1, -1)
# Calculate cosine similarity
similarity = cosine_similarity(vector1, vector2)
print(similarity[0][0])
Pretty clean, right? scikit-learn handles all the heavy lifting behind the scenes, and you just get your answer.
If you are already working with scipy, you can calculate cosine similarity a little differently. Here’s a quick look:
python
CopyEdit
from scipy import spatial
# Example vectors
vector1 = [1, 2, 3]
vector2 = [4, 5, 6]
# Calculate cosine similarity
similarity = 1 - spatial.distance.cosine(vector1, vector2)
print(similarity)
Notice that spatial.distance.cosine actually gives you the cosine distance. That's why we subtract it from 1 to get cosine similarity.
If you like to see the math happening, you can calculate cosine similarity manually. It’s not that bad! Here’s a simple version:
python
CopyEdit
import numpy as np
# Example vectors
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
# Calculate the dot product
dot_product = np.dot(vector1, vector2)
# Calculate magnitudes
magnitude1 = np.linalg.norm(vector1)
magnitude2 = np.linalg.norm(vector2)
# Calculate cosine similarity
similarity = dot_product / (magnitude1 * magnitude2)
print(similarity)
Manual calculation can be really handy if you want to understand exactly what’s going on instead of relying on libraries.
Sometimes, it’s not just two vectors—you may have a whole bunch and need to find pairwise cosine similarities. Here’s how you can handle that:
python
CopyEdit
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
# List of vectors
vectors = np.array([
[1, 0, 0],
[0, 1, 0],
[1, 1, 0]
])
# Calculate pairwise cosine similarity
similarity_matrix = cosine_similarity(vectors)
print(similarity_matrix)
The result will be a matrix showing how similar each vector is to the others. Super useful for clustering, recommendation systems, and even finding duplicates.
If you’re working with machine learning libraries like TensorFlow or PyTorch, they both offer quick ways to calculate cosine similarity, too.
python
CopyEdit
import tensorflow as tf
vector1 = tf.constant([1, 2, 3], dtype=tf.float32)
vector2 = tf.constant([4, 5, 6], dtype=tf.float32)
cos_sim = tf.keras.losses.cosine_similarity(vector1, vector2)
similarity = 1 + cos_sim.numpy()
print(similarity)
python
CopyEdit
import torch
vector1 = torch.tensor([1, 2, 3], dtype=torch.float32)
vector2 = torch.tensor([4, 5, 6], dtype=torch.float32)
vector1_norm = vector1 / vector1.norm()
vector2_norm = vector2 / vector2.norm()
similarity = torch.dot(vector1_norm, vector2_norm).item()
print(similarity)
These options are handy when your project is already using deep learning libraries, and you don't want to import something extra.
Cosine similarity works best when magnitude doesn't matter but direction does. It's perfect for:
If you are dealing with data that is mostly sparse (like text or user behavior), cosine similarity often gives much better results than other metrics like Euclidean distance.
While cosine similarity is powerful, it’s not the best choice for every situation. Here are a few things you should keep in mind:
These points don't make cosine similarity bad; you just need to make sure it fits the kind of data you are working with.
Learning how to calculate cosine similarity in Python opens up a lot of new possibilities for your projects. Whether you're using scikit-learn, scipy, or doing it manually, the concept stays the same: compare the direction of vectors, not their size. Once you start working with it, you’ll find cosine similarity becomes a natural part of your toolbox whenever you need to measure similarity. It’s a lightweight technique that can fit into recommendation engines, search engines, and even everyday data comparison tasks. Knowing a few different ways to calculate it means you can easily adapt depending on what libraries you’re already using.
Advertisement
How can AI help transform your sketches into realistic renders? Discover how PromeAI enhances your designs, from concept to portfolio-ready images, with ease and precision
Curious about where AI is headed after ChatGPT? Take a look at what's coming next in the world of generative AI and how chatbots might evolve in the near future
Discover 10 ChatGPT plugins designed to simplify PDF tasks like summarizing, converting, creating, and extracting text.
Veed.io makes video editing easy and fast with AI-powered tools. From auto-generated subtitles to customizable templates, create professional videos without hassle
Looking for the best places to buy or sell AI prompts? Discover the top AI prompt marketplaces of 2025 and find the right platform to enhance your AI projects
Curious about whether premium AI prompts are actually worth paying for? This article breaks down what you’re really getting, who benefits most, and when it's better to stick with free options.
Looking for a smart alternative to Devin AI that actually fits your workflow? Here are 8 options that help you code faster without overcomplicating the process
Ever imagined crafting an AI assistant tailored just for you? OpenAI's latest ChatGPT update introduces custom GPTs, enabling users to design personalized chatbots without any coding experience
Curious how machines are learning to create original art? See how DCGAN helps generate realistic sketches and brings new ideas to creative fields
Which machine learning tools actually help get real work done? This guide breaks down 9 solid options and shows you how to use PyTorch with clarity and control
Looking for smarter ways to get answers or spark ideas? Discover 10 standout ChatGPT alternatives that offer unique features, faster responses, or a different style of interaction
Looking for the best chatbot builder to use in 2025? This guide breaks down 10 top tools and shows you exactly how to use one of them to get real results, even if you're just starting out