12/10/2024
Share this post:
Export:

I’ve been circling around the idea of RightMark for a while now—a system that provides a measurable, objective standard to determine if a new creative work is too close to existing copyrighted material. It’s that simple. And the more I think about it, the more obvious it feels: content similarity can be measured, we have the tools, so why not apply them to the murky waters of copyright and originality?
RightMark is a proposed framework that vectorizes creative works (text, images, music, etc.) into a high-dimensional space and uses metrics like cosine similarity to measure how close a new work is to existing material. If you’re inside some defined “bubble” of similarity, that could mean infringement. If you’re comfortably outside that zone, you’ve earned your “RightMark”—a sort of originality certification.
The idea is to provide a quantitative measure for what’s traditionally subjective and legally tangled. Instead of wading through endless “is this too similar?” debates, RightMark offers a clear line in the sand (or rather, a defined radius in vector space).
While I’m not going to dive into a full implementation, here’s a rough sketch to give you a feel for how one might structure the heart of RightMark:
from typing import List
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
class RightMarkEngine:
def __init__(self, reference_embeddings: List[np.ndarray], threshold: float):
"""
reference_embeddings: List of vectorized embeddings representing known copyrighted works
threshold: A similarity threshold that defines the boundary of allowed originality
e.g., works must have a cosine distance *less* than (1 - threshold) to be considered original
"""
self.reference_embeddings = np.array(reference_embeddings)
self.threshold = threshold
def compute_embedding(self, new_work: str) -> np.ndarray:
"""
Placeholder for actual embedding logic.
In practice, you'd call a model like Sentence-BERT or a domain-specific embedding generator.
"""
# Pseudo-code: embedding = model.encode(new_work)
embedding = np.random.rand(768) # mock embedding for illustration
return embedding
def assess_originality(self, new_work: str) -> bool:
"""
Returns True if the new_work is considered original enough by RightMark standards.
"""
new_embedding = self.compute_embedding(new_work)
similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
max_similarity = np.max(similarities)
# If the maximum similarity to any known work is below (1 - threshold), then it's original
# Note: cosine_similarity ranges from -1 to 1, where 1 means identical.
# If we say threshold=0.8, we might mean the work should not be more than 80% similar (cosine sim > 0.8)
return max_similarity < self.threshold
def similarity_score(self, new_work: str) -> float:
"""
For more detail, return the maximum similarity score, not just a boolean.
"""
new_embedding = self.compute_embedding(new_work)
similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
return float(np.max(similarities))
Note: This is a stripped-down conceptual skeleton. In reality, you’d have:
A reliable embedding function (e.g., Sentence Transformers for text, CLIP for images, or specialized embeddings for music). Domain-appropriate normalization and preprocessing. A set of reference embeddings stored efficiently, likely indexed with a vector database (like Faiss or Pinecone) for scalability. Governance and Adaptation One of the core challenges is determining the threshold. That’s where a governing body—the RightMark Foundation—steps in. Imagine a consortium of domain experts, creators, and legal advisers who periodically test and adjust thresholds based on real-world feedback, human judgments, and evolving standards. Different media might have different thresholds, and over time these can be recalibrated.
Why This is Exciting RightMark isn’t just about enforcing rules—it’s about clarifying them. By giving creators immediate, objective feedback on how original their work is (compared to a large database of known content), we encourage experimentation and reduce accidental infringement.
Instead of feeling like some ephemeral “vibe check,” originality becomes a measurable criterion. Yes, “measurable originality” sounds paradoxical, but it’s really just an extension of what we do in search engines, recommendation systems, and plagiarism checks—now applied to the rich and nuanced world of creative production.
A Potential Future As RightMark matures, we might see:
Educational Tools: Students can quickly check if their essays are too derivative. Legal References: Courts consult RightMark reports as part of their decision-making process. Cultural Indices: We could measure how trends shift over time, how close new works cluster around certain styles, and detect periods of creative explosion. And let’s not forget personal use-cases—like scanning the web to spot deep fakes or unauthorized use of your personal content.
Wrapping Up RightMark feels obvious to me now, because once you start thinking about originality as a spatial concept—embeddings, distances, thresholds—it all clicks. It’s one of those ideas that makes you wonder why we haven’t done it yet. But that’s the beauty of it: sometimes the best concepts are the ones that, once articulated, feel like they were always waiting there in the background.
For more technical deep dives on embeddings and similarity searches, I’ve always appreciated the documentation at Nils Reimers’s Sentence Transformers project, which provides a practical and well-documented approach to embeddings and similarity scoring. While they focus on NLP, the principles extend across domains.
In the end, RightMark is about empowering creators and users with clarity. It turns guesswork into a measurable factor, encouraging honesty, innovation, and a healthier creative ecosystem. And to me, that’s worth carving a new space in the conversation.
The Ad You Can’t See: Why Ad-Supported AI Will Poison Everything
An interactive demonstration of how advertising-funded AI chatbots can invisibly corrupt the answers you trust most.
Meat Sack!?
When the meat sack and the AI converge independently on the same design instincts, you know the reasoning is sound.
Stop Asking Your AI to Count
The anti-hallucination architecture: use LLMs for intent, deterministic engines for truth, and stop asking either one to be the other.
Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.