12/10/2024
Share this post:
Export:
I’ve been circling around the idea of RightMark for a while now—a system that provides a measurable, objective standard to determine if a new creative work is too close to existing copyrighted material. It’s that simple. And the more I think about it, the more obvious it feels: content similarity can be measured, we have the tools, so why not apply them to the murky waters of copyright and originality?
RightMark is a proposed framework that vectorizes creative works (text, images, music, etc.) into a high-dimensional space and uses metrics like cosine similarity to measure how close a new work is to existing material. If you’re inside some defined “bubble” of similarity, that could mean infringement. If you’re comfortably outside that zone, you’ve earned your “RightMark”—a sort of originality certification.
The idea is to provide a quantitative measure for what’s traditionally subjective and legally tangled. Instead of wading through endless “is this too similar?” debates, RightMark offers a clear line in the sand (or rather, a defined radius in vector space).
While I’m not going to dive into a full implementation, here’s a rough sketch to give you a feel for how one might structure the heart of RightMark:
from typing import List
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
class RightMarkEngine:
def __init__(self, reference_embeddings: List[np.ndarray], threshold: float):
"""
reference_embeddings: List of vectorized embeddings representing known copyrighted works
threshold: A similarity threshold that defines the boundary of allowed originality
e.g., works must have a cosine distance *less* than (1 - threshold) to be considered original
"""
self.reference_embeddings = np.array(reference_embeddings)
self.threshold = threshold
def compute_embedding(self, new_work: str) -> np.ndarray:
"""
Placeholder for actual embedding logic.
In practice, you'd call a model like Sentence-BERT or a domain-specific embedding generator.
"""
# Pseudo-code: embedding = model.encode(new_work)
embedding = np.random.rand(768) # mock embedding for illustration
return embedding
def assess_originality(self, new_work: str) -> bool:
"""
Returns True if the new_work is considered original enough by RightMark standards.
"""
new_embedding = self.compute_embedding(new_work)
similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
max_similarity = np.max(similarities)
# If the maximum similarity to any known work is below (1 - threshold), then it's original
# Note: cosine_similarity ranges from -1 to 1, where 1 means identical.
# If we say threshold=0.8, we might mean the work should not be more than 80% similar (cosine sim > 0.8)
return max_similarity < self.threshold
def similarity_score(self, new_work: str) -> float:
"""
For more detail, return the maximum similarity score, not just a boolean.
"""
new_embedding = self.compute_embedding(new_work)
similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
return float(np.max(similarities))
Note: This is a stripped-down conceptual skeleton. In reality, you’d have:
A reliable embedding function (e.g., Sentence Transformers for text, CLIP for images, or specialized embeddings for music). Domain-appropriate normalization and preprocessing. A set of reference embeddings stored efficiently, likely indexed with a vector database (like Faiss or Pinecone) for scalability. Governance and Adaptation One of the core challenges is determining the threshold. That’s where a governing body—the RightMark Foundation—steps in. Imagine a consortium of domain experts, creators, and legal advisers who periodically test and adjust thresholds based on real-world feedback, human judgments, and evolving standards. Different media might have different thresholds, and over time these can be recalibrated.
Why This is Exciting RightMark isn’t just about enforcing rules—it’s about clarifying them. By giving creators immediate, objective feedback on how original their work is (compared to a large database of known content), we encourage experimentation and reduce accidental infringement.
Instead of feeling like some ephemeral “vibe check,” originality becomes a measurable criterion. Yes, “measurable originality” sounds paradoxical, but it’s really just an extension of what we do in search engines, recommendation systems, and plagiarism checks—now applied to the rich and nuanced world of creative production.
A Potential Future As RightMark matures, we might see:
Educational Tools: Students can quickly check if their essays are too derivative. Legal References: Courts consult RightMark reports as part of their decision-making process. Cultural Indices: We could measure how trends shift over time, how close new works cluster around certain styles, and detect periods of creative explosion. And let’s not forget personal use-cases—like scanning the web to spot deep fakes or unauthorized use of your personal content.
Wrapping Up RightMark feels obvious to me now, because once you start thinking about originality as a spatial concept—embeddings, distances, thresholds—it all clicks. It’s one of those ideas that makes you wonder why we haven’t done it yet. But that’s the beauty of it: sometimes the best concepts are the ones that, once articulated, feel like they were always waiting there in the background.
For more technical deep dives on embeddings and similarity searches, I’ve always appreciated the documentation at Nils Reimers’s Sentence Transformers project, which provides a practical and well-documented approach to embeddings and similarity scoring. While they focus on NLP, the principles extend across domains.
In the end, RightMark is about empowering creators and users with clarity. It turns guesswork into a measurable factor, encouraging honesty, innovation, and a healthier creative ecosystem. And to me, that’s worth carving a new space in the conversation.
One Sentence, 150,000 Words
I typed one sentence into QuestMaster. It returned 35 design documents, 150,000 words of Mars colony engineering specifications, and 6 interactive sim...
Temporal Mechanism Design: Time Goggles for Civilization
We don't need better humans. We need better games. A framework for solving long-term problems with short-term optimizers.
Leylines: On Discovery, Creation, and Navigating the Hyperdimensional Universe
Everything that can exist, does exist—somewhere in the vast hyperdimensional universe. The question isn't whether to discover or create, but how effic...
Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.