Calculating Cosine Similarity in Python with Simple Examples

Advertisement

Apr 25, 2025 By Tessa Rodriguez

Python makes it pretty easy to calculate cosine similarity, and once you know how it works, you’ll find yourself using it all the time. Whether you're working with text data, user behavior, or item recommendations, cosine similarity offers a simple way to measure how alike two sets of data are. Let’s take a closer look at how it’s done.

What Is Cosine Similarity, and Why Use It?

Cosine similarity compares the angle between two non-zero vectors in a high-dimensional space. Rather than paying attention to the magnitude (size) of the vectors, it contrasts their direction. This comes in particularly handy when you wish to verify how similar two documents or data points are without being sidetracked by size discrepancies.

Imagine it this way: Two essays could be of different lengths, but if they're discussing the same thing with the same words, then cosine similarity will catch that. If they are exactly the same, the score is 1. If they are totally different, the score is 0. Anything else in between indicates how close or far they are.

Different Ways to Calculate Cosine Similarity in Python

Now that you have gotten the basic idea let's jump into the different ways you can calculate cosine similarity using Python. Whether you like keeping things simple with built-in libraries or you prefer writing a bit more code manually, there's something for everyone.

Using scikit-learn

One of the quickest ways to calculate cosine similarity is by using the scikit-learn library. Here’s how you can do it:

python

CopyEdit

from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

# Example vectors

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

# Reshape for sklearn (it expects 2D arrays)

vector1 = vector1.reshape(1, -1)

vector2 = vector2.reshape(1, -1)

# Calculate cosine similarity

similarity = cosine_similarity(vector1, vector2)

print(similarity[0][0])

Pretty clean, right? scikit-learn handles all the heavy lifting behind the scenes, and you just get your answer.

Using scipy

If you are already working with scipy, you can calculate cosine similarity a little differently. Here’s a quick look:

python

CopyEdit

from scipy import spatial

# Example vectors

vector1 = [1, 2, 3]

vector2 = [4, 5, 6]

# Calculate cosine similarity

similarity = 1 - spatial.distance.cosine(vector1, vector2)

print(similarity)

Notice that spatial.distance.cosine actually gives you the cosine distance. That's why we subtract it from 1 to get cosine similarity.

Manual Calculation

If you like to see the math happening, you can calculate cosine similarity manually. It’s not that bad! Here’s a simple version:

python

CopyEdit

import numpy as np

# Example vectors

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

# Calculate the dot product

dot_product = np.dot(vector1, vector2)

# Calculate magnitudes

magnitude1 = np.linalg.norm(vector1)

magnitude2 = np.linalg.norm(vector2)

# Calculate cosine similarity

similarity = dot_product / (magnitude1 * magnitude2)

print(similarity)

Manual calculation can be really handy if you want to understand exactly what’s going on instead of relying on libraries.

Calculating Cosine Similarity Between Multiple Vectors

Sometimes, it’s not just two vectors—you may have a whole bunch and need to find pairwise cosine similarities. Here’s how you can handle that:

python

CopyEdit

from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

# List of vectors

vectors = np.array([

[1, 0, 0],

[0, 1, 0],

[1, 1, 0]

])

# Calculate pairwise cosine similarity

similarity_matrix = cosine_similarity(vectors)

print(similarity_matrix)

The result will be a matrix showing how similar each vector is to the others. Super useful for clustering, recommendation systems, and even finding duplicates.

Using TensorFlow or PyTorch

If you’re working with machine learning libraries like TensorFlow or PyTorch, they both offer quick ways to calculate cosine similarity, too.

TensorFlow Example:

python

CopyEdit

import tensorflow as tf

vector1 = tf.constant([1, 2, 3], dtype=tf.float32)

vector2 = tf.constant([4, 5, 6], dtype=tf.float32)

cos_sim = tf.keras.losses.cosine_similarity(vector1, vector2)

similarity = 1 + cos_sim.numpy()

print(similarity)

PyTorch Example:

python

CopyEdit

import torch

vector1 = torch.tensor([1, 2, 3], dtype=torch.float32)

vector2 = torch.tensor([4, 5, 6], dtype=torch.float32)

vector1_norm = vector1 / vector1.norm()

vector2_norm = vector2 / vector2.norm()

similarity = torch.dot(vector1_norm, vector2_norm).item()

print(similarity)

These options are handy when your project is already using deep learning libraries, and you don't want to import something extra.

When Should You Use Cosine Similarity?

Cosine similarity works best when magnitude doesn't matter but direction does. It's perfect for:

  • Comparing documents
  • Matching user interests
  • Finding similar products
  • Checking similarity in search results
  • Filtering spam content

If you are dealing with data that is mostly sparse (like text or user behavior), cosine similarity often gives much better results than other metrics like Euclidean distance.

Things to Watch Out For

While cosine similarity is powerful, it’s not the best choice for every situation. Here are a few things you should keep in mind:

  • It does not consider magnitude, so two small vectors pointing in the same direction will be seen as identical to two large vectors pointing the same way.
  • It doesn’t work well when vectors are filled with zeros unless you preprocess them properly.
  • It's designed for non-negative data. Negative values can mess up the results unless you know what you're doing.

These points don't make cosine similarity bad; you just need to make sure it fits the kind of data you are working with.

Wrapping Up!

Learning how to calculate cosine similarity in Python opens up a lot of new possibilities for your projects. Whether you're using scikit-learn, scipy, or doing it manually, the concept stays the same: compare the direction of vectors, not their size. Once you start working with it, you’ll find cosine similarity becomes a natural part of your toolbox whenever you need to measure similarity. It’s a lightweight technique that can fit into recommendation engines, search engines, and even everyday data comparison tasks. Knowing a few different ways to calculate it means you can easily adapt depending on what libraries you’re already using.

Advertisement

Recommended Updates

Applications

How PromeAI Enhances Your Sketches into Realistic Renders

Tessa Rodriguez / May 03, 2025

How can AI help transform your sketches into realistic renders? Discover how PromeAI enhances your designs, from concept to portfolio-ready images, with ease and precision

Applications

The Future of Generative AI and Chatbots Beyond ChatGPT

Tessa Rodriguez / Apr 29, 2025

Curious about where AI is headed after ChatGPT? Take a look at what's coming next in the world of generative AI and how chatbots might evolve in the near future

Basics Theory

Top 10 ChatGPT PDF Plugins That Make Your Workflow More Efficient

Tessa Rodriguez / May 13, 2025

Discover 10 ChatGPT plugins designed to simplify PDF tasks like summarizing, converting, creating, and extracting text.

Applications

How Veed.io Simplifies Video Editing for Everyone

Alison Perry / May 01, 2025

Veed.io makes video editing easy and fast with AI-powered tools. From auto-generated subtitles to customizable templates, create professional videos without hassle

Applications

Top AI Prompt Marketplaces to Explore in 2025

Alison Perry / Apr 29, 2025

Looking for the best places to buy or sell AI prompts? Discover the top AI prompt marketplaces of 2025 and find the right platform to enhance your AI projects

Applications

What You Actually Get When You Pay for AI Prompts

Tessa Rodriguez / Apr 28, 2025

Curious about whether premium AI prompts are actually worth paying for? This article breaks down what you’re really getting, who benefits most, and when it's better to stick with free options.

Applications

8 Best Devin AI Alternatives for Smarter Coding

Tessa Rodriguez / May 01, 2025

Looking for a smart alternative to Devin AI that actually fits your workflow? Here are 8 options that help you code faster without overcomplicating the process

Applications

Personalize Your AI: Crafting Custom GPTs with ChatGPT's New Feature

Tessa Rodriguez / Apr 29, 2025

Ever imagined crafting an AI assistant tailored just for you? OpenAI's latest ChatGPT update introduces custom GPTs, enabling users to design personalized chatbots without any coding experience

Applications

Understanding How DCGAN Models Improve Creative Sketch Generation

Alison Perry / Apr 24, 2025

Curious how machines are learning to create original art? See how DCGAN helps generate realistic sketches and brings new ideas to creative fields

Applications

Best Software for Machine Learning Projects: 9 Tools to Try

Tessa Rodriguez / May 03, 2025

Which machine learning tools actually help get real work done? This guide breaks down 9 solid options and shows you how to use PyTorch with clarity and control

Applications

Explore the 10 Best ChatGPT Alternatives for AI Conversations

Tessa Rodriguez / Apr 28, 2025

Looking for smarter ways to get answers or spark ideas? Discover 10 standout ChatGPT alternatives that offer unique features, faster responses, or a different style of interaction

Applications

Chatbot Software to Watch in 2025: 10 Standout Options

Alison Perry / Apr 30, 2025

Looking for the best chatbot builder to use in 2025? This guide breaks down 10 top tools and shows you exactly how to use one of them to get real results, even if you're just starting out