RightMark: A Clear Line in the Sand for Originality

RightMark
AI
copyright
originality

12/10/2024


947 words · 5 min read

Share this post:


Export:

RightMark: A Clear Line in the Sand for Originality

I’ve been circling around the idea of RightMark for a while now—a system that provides a measurable, objective standard to determine if a new creative work is too close to existing copyrighted material. It’s that simple. And the more I think about it, the more obvious it feels: content similarity can be measured, we have the tools, so why not apply them to the murky waters of copyright and originality?

Recap: What is RightMark?

RightMark is a proposed framework that vectorizes creative works (text, images, music, etc.) into a high-dimensional space and uses metrics like cosine similarity to measure how close a new work is to existing material. If you’re inside some defined “bubble” of similarity, that could mean infringement. If you’re comfortably outside that zone, you’ve earned your “RightMark”—a sort of originality certification.

The idea is to provide a quantitative measure for what’s traditionally subjective and legally tangled. Instead of wading through endless “is this too similar?” debates, RightMark offers a clear line in the sand (or rather, a defined radius in vector space).

Why It Matters

  • Transparency: Creators know exactly how close their work is to known pieces, reducing the risk of accidental infringement.
  • Encouraging Novelty: Instead of playing it safe and remixing recent hits, creators are incentivized to explore new territory. They can see, objectively, when they’re drifting too close to existing content.
  • Legal Clarity: Courts and lawyers can reference a data-driven standard. This doesn’t replace legal judgment, but it provides a strong data point.
  • Scalable & Adaptable: As AI embeddings get better, RightMark gets more accurate. Thresholds can evolve over time, and different “bubbles” can apply for different use-cases—fair use, educational content, public domain works, and so on.

The Core Concept in Pseudo Code

While I’m not going to dive into a full implementation, here’s a rough sketch to give you a feel for how one might structure the heart of RightMark:

from typing import List
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class RightMarkEngine:
    def __init__(self, reference_embeddings: List[np.ndarray], threshold: float):
        """
        reference_embeddings: List of vectorized embeddings representing known copyrighted works
        threshold: A similarity threshold that defines the boundary of allowed originality
                   e.g., works must have a cosine distance *less* than (1 - threshold) to be considered original
        """
        self.reference_embeddings = np.array(reference_embeddings)
        self.threshold = threshold

    def compute_embedding(self, new_work: str) -> np.ndarray:
        """
        Placeholder for actual embedding logic.
        In practice, you'd call a model like Sentence-BERT or a domain-specific embedding generator.
        """
        # Pseudo-code: embedding = model.encode(new_work)
        embedding = np.random.rand(768)  # mock embedding for illustration
        return embedding

    def assess_originality(self, new_work: str) -> bool:
        """
        Returns True if the new_work is considered original enough by RightMark standards.
        """
        new_embedding = self.compute_embedding(new_work)
        similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
        max_similarity = np.max(similarities)

        # If the maximum similarity to any known work is below (1 - threshold), then it's original
        # Note: cosine_similarity ranges from -1 to 1, where 1 means identical.
        # If we say threshold=0.8, we might mean the work should not be more than 80% similar (cosine sim > 0.8)
        return max_similarity < self.threshold

    def similarity_score(self, new_work: str) -> float:
        """
        For more detail, return the maximum similarity score, not just a boolean.
        """
        new_embedding = self.compute_embedding(new_work)
        similarities = cosine_similarity(new_embedding.reshape(1, -1), self.reference_embeddings)
        return float(np.max(similarities))

Note: This is a stripped-down conceptual skeleton. In reality, you’d have:

A reliable embedding function (e.g., Sentence Transformers for text, CLIP for images, or specialized embeddings for music). Domain-appropriate normalization and preprocessing. A set of reference embeddings stored efficiently, likely indexed with a vector database (like Faiss or Pinecone) for scalability. Governance and Adaptation One of the core challenges is determining the threshold. That’s where a governing body—the RightMark Foundation—steps in. Imagine a consortium of domain experts, creators, and legal advisers who periodically test and adjust thresholds based on real-world feedback, human judgments, and evolving standards. Different media might have different thresholds, and over time these can be recalibrated.

Why This is Exciting RightMark isn’t just about enforcing rules—it’s about clarifying them. By giving creators immediate, objective feedback on how original their work is (compared to a large database of known content), we encourage experimentation and reduce accidental infringement.

Instead of feeling like some ephemeral “vibe check,” originality becomes a measurable criterion. Yes, “measurable originality” sounds paradoxical, but it’s really just an extension of what we do in search engines, recommendation systems, and plagiarism checks—now applied to the rich and nuanced world of creative production.

A Potential Future As RightMark matures, we might see:

Educational Tools: Students can quickly check if their essays are too derivative. Legal References: Courts consult RightMark reports as part of their decision-making process. Cultural Indices: We could measure how trends shift over time, how close new works cluster around certain styles, and detect periods of creative explosion. And let’s not forget personal use-cases—like scanning the web to spot deep fakes or unauthorized use of your personal content.

Wrapping Up RightMark feels obvious to me now, because once you start thinking about originality as a spatial concept—embeddings, distances, thresholds—it all clicks. It’s one of those ideas that makes you wonder why we haven’t done it yet. But that’s the beauty of it: sometimes the best concepts are the ones that, once articulated, feel like they were always waiting there in the background.

For more technical deep dives on embeddings and similarity searches, I’ve always appreciated the documentation at Nils Reimers’s Sentence Transformers project, which provides a practical and well-documented approach to embeddings and similarity scoring. While they focus on NLP, the principles extend across domains.

In the end, RightMark is about empowering creators and users with clarity. It turns guesswork into a measurable factor, encouraging honesty, innovation, and a healthier creative ecosystem. And to me, that’s worth carving a new space in the conversation.



Subscribe to the Newsletter

Get notified when I publish new blog posts about game development, AI, entrepreneurship, and technology. No spam, unsubscribe anytime.

By subscribing, you agree to receive emails from Erik Bethke. You can unsubscribe at any time.

Comments

Loading comments...

Comments are powered by Giscus. You'll need a GitHub account to comment.