Resisting Hyrum's Law with Private Constructors in Python

Hyrum's Law States:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

However, by hiding implementation details and providing stable interfaces for operations your users want to do, the "sufficient" number can be increased.

Python doesn't support completely hiding implementation details but has a convention that names beginning with a single leading underscore (e.g. a function called _create) are classed as implementation details and that the library developer makes no stability guarantees.

Unfortunately, there are features that require implementing special methods with fixed names that can't be used to determine whether they are public.

__init__, the constructor special method, is a noteworthy challenge since the Python data model requires this method to always exist.

Why Might You Want to Make the Constructor Private?

You might want private constructors if:

Your object should only exist when returned from another object's method.

class Ref:
    def __init__(self, ...) -> None:
        ...

    @property
    def name(self) -> str:
        ...

    @property
    def version(self) -> str | None:
        ...

class Document:
    def get_element_by_name(self, name: str) -> Ref:
        ....

Your object can be constructed from different types and want to manage your API surface with named constructors instead, and there's either no "natural" constructor or you want to have all constructors named for consistency.

Supposing we have a class representing a config file that can be constructed in a number of different ways, you could have an overloaded constructor where the parameter determines how to build it like this:

from io import TextIOBase, TextIOWrapper
from pathlib import Path
from typing import BinaryIO, TextIO

class MyConfigFile:
    f: TextIO

    def __init__(self, f: str | Path | int | BinaryIO | TextIO) -> None:
        if isinstance(f, str):
            f = Path(f)
        if isinstance(f, Path):
            f = f.open()
        if isinstance(f, int):
            f = open(f)
        if not isinstance(f, TextIOBase):
            f = TextIOWrapper(f)
        self.f = f

    ...

This form of overloading is useful when you've already committed to having an API based on the passed in parameter, but with foresight you might find named constructors more manageable.

from io import TextIOWrapper
from pathlib import Path
from typing import BinaryIO, Self, TextIO

class MyConfigFile:
    f: TextIO

    def __init__(self, f: TextIO, /) -> None:
        self.f = f

    @classmethod
    def from_binary_file_object(cls, fobj: BinaryIO) -> Self:
        return cls(TextIOWrapper(fobj))

    @classmethod
    def from_file_descriptor(cls, fd: int) -> Self:
        return cls(open(fd))

    @classmethod
    def from_path(cls, path: Path) -> Self:
        return cls(path.open())

    @classmethod
    def from_name(cls, name: str) -> Self:
        return cls(open(name))

    ...

In this example there is a natural default constructor that wraps a pre-existing text file object, but exposing that constructor as API is a leaky abstraction that commits the implementation to wrapping the text file object.

You might find that the config file is rarely opened so lazily opening it on-demand would be a potential optimisation that is made more difficult by the constructor requiring an already open file.

So for consistency and API management it would be better to have another from_text_file_object named constructor.

Your object may be constructed from the same types but the values may be interpreted differently so require named constructors and there may not be a natural default constructor.

datetime has two named constructors for creation from a timestamp. datetime.fromtimestamp(timestamp) will create a datetime in the local timezone while datetime.utcfromtimestamp(timestamp) will create one in Coordinated Universal Time.

As with the constructor overloading example above the datetime constructor exposes that the internal representation is separate attributes for the year, month, day, hour, minute, second, microseconds and time zone.

This has the same problem of making it hard to change the representation if it would be more convenient to operate on time as a single number.

The only way to create an object may be expensive and a constructor may imply it's cheaper than it is.

The natural constructors of the last two examples are pretty cheap, they assign passed in values to the object (though datetime does some validation).

An object representing the contents of some remote file may be represented as:

from urllib.parse import SplitResult as Url

class RemoteFile:
    origin: Url
    contents: bytes

Since the only way to construct it is to fetch the contents of the URL it might feel natural to fetch in the constructor.

from urllib.parse import SplitResult as Url
from urllib.request import urlopen

class RemoteFile:
    origin: Url
    contents: bytes

    def __init__(self, url: Url):
        self.url = url
        with urlopen(url.geturl()) as f:
            self.contents = f.read()

However if RemoteFile(url) implies the object is cheap to construct then it may be accidentally called in a tight loop and cause performance problems in a way that is not obvious.

This would be more obvious with a named constructor.

from urllib.parse import SplitResult as Url
from urllib.request import urlopen

class RemoteFile:
    origin: Url
    contents: bytes

    def __init__(self, url: Url, contents: bytes):
        self.url = url
        self.contents = contents

    @classmethod
    def fetch(cls, url: Url) -> Self:
        with urlopen(url.geturl()) as f:
            return cls(url, f.read())

However this then means that the constructor is public and objects could be created directly, potentially causing confusion from the contents not matching what could be fetched from the URL.

How Do I Make My Constructor Uncallable Then?

Uncallable is a relative term in Python, but there are two important things we can achieve:

Misuse of an object you shouldn't have created.
A warning from tooling that you shouldn't use the constructor.

So How Do We Prevent the Object Being Created?

Conventionally NotImplementedError is raised from methods in Abstract Base Classes to indicate that the subclass is incomplete if the method is not overridden, which is similar enough to "this method should not be called".

class MyClass:
    def __init__(self):
        raise NotImplementedError("Don't use the MyClass constructor")

    @classmethod
    def _private_constructor(cls, foo):
        ...

Of course this then makes the constructor unusable for all users, so how do we implement our private constructor?

Private Constructors With Private Tokens

One approach is to require that the constructor requires a special value be provided and that the value is explicitly marked as private.

This value is sometimes called a sentinel value due to their historic use as a special value at the end of a sequence to identify that it's over e.g. the NUL byte at the end of a C string.

A historically popular way of creating these in python is to use the object() constructor since it creates a unique value that is different from any other value that can be created.

_PRIVATE_TOKEN = object()

class MyClass:
    def __init__(self, foo, *, _private_token):
        if _private_token is not _PRIVATE_TOKEN:
            raise NotImplementedError("Don't use the MyClass constructor")

        self.foo = foo

    @classmethod
    def _private_constructor(cls, foo):
        return cls(foo, _private_token=_PRIVATE_TOKEN)

Since _PRIVATE_TOKEN is explicitly private, and so is _private_constructor the user can't unintentionally create the object incorrectly, and will discover they've created it wrong when they test it.

However, this additional parameter check is extra overhead that could cause worse performance in a tight loop.

How Can I Create Objects Without Using `init`?

__init__ is not the only special method that plays a part during object construction.

When you use an object's constructor such as our MyClass, the general process is:

self = MyClass.__new__(MyClass)
MyClass.__init__(self, foo)

So there is a separate step for creating an empty object of the appropriate type and initializing it, and __init__ is expected to call the constructors of its parent classes.

Which means if we don't want to use the constructor we can do this ourselves.

class MyClass:
    def __init__(self):
        raise NotImplementedError("Don't use the MyClass constructor")

    @classmethod
    def _private_constructor(cls, foo):
        self = super().__new__(cls)
        self.foo = foo
        super().__init__(self)
        return self

So now the constructor is unusable but we have an explicitly private constructor that we can use to implement our public named constructors or functions or methods on other objects that might want to return our object.

Getting Tooling to Warn

You may have read this and thought "why don't we just document that the constructor shouldn't be used" and it's a good idea to do that, but time is finite and developers are typically working under pressure so will reach for what looks like the most obvious thing to do and the documentation may only be read after it doesn't work as expected.

Developers know this about themselves though, so invest in tooling like IDEs, test suites, pre-commit check hooks and CI pipelines to catch issues.

Type annotations allow IDEs and static type checkers like mypy allow library developers to annotate their types and functions to support additional checking before the code is evaluated.

Here is our previous solution with type annotations:

from typing import Self

class MyClass:
    foo: str

    def __init__(self) -> None:
        raise NotImplementedError("Don't use the MyClass constructor")

    @classmethod
    def _private_constructor(cls, foo: str) -> Self:
        self = cls.__new__(cls)
        super().__init__(self)
        self.foo = foo
        return self

As you can see there's nothing indicating that __init__ should not be called in the type system.

Now that we are discussing annotations, it may have caught your attention that we have an annotation for functions that can only raise an exception or infinitely loop called NoReturn and it may be tempting to annotate the constructor with it.

However this will confuse mypy.

from typing import NoReturn, Self, reveal_type

class MyClass:
    foo: str

    def __init__(self) -> NoReturn:
        raise NotImplementedError("Don't call the MyClass constructor")

    @classmethod
    def _private_constructor(cls, foo: str) -> Self:
        self = cls.__new__(cls)
        super().__init__(self)
        self.foo = foo
        return self

reveal_type(MyClass)  # Revealed type is "def () -> Never"

Other type checkers like ty may have different results, but constructors are expected to return None and there's other ways to accommodate them.

NoReturn is a bottom type indicating a value that can't exist.

There's also the Never¹ type that represents unreachable code, a placeholder in generic types for states that don't exist or parameters to functions that cannot be called, which is precisely what we want to do with our __init__.

from typing import Never, Self

class MyClass:
    foo: str

    def __init__(self, *, _never: Never) -> None:
        raise NotImplementedError("Don't call the MyClass constructor")

    @classmethod
    def _private_constructor(cls, foo: str) -> Self:
        self = cls.__new__(cls)
        super().__init__(self)
        self.foo = foo
        return self

MyClass()  # error: Missing named argument "_never" for "MyClass"

So we must provide the _never parameter, it's indicated to be explicitly a private parameter and there's no value you can pass that satisfies the type checker².

MyClass(_never=object())  # error: Argument "_never" to "MyClass" has incompatible type "object"; expected "Never"

This doesn't prevent the constructor from being called at runtime, but the raised exception will handle that.

What Are My Alternatives if This Doesn't Work for Me?

The simplest alternative is to rename your whole class with a leading underscore (e.g. _MyClass) to indicate that it is private but that will result in confusing documentation about whether it should be used.

Another, which follows a common pattern in Java, is a private implementation of a public interface.

Python doesn't have Java-like interfaces, but it does have Abstract Base Classes (ABCs) (and supports Java interface-like behaviour by allowing multiple inheritance) which raises an exception when attempting to create an object when the class doesn't implement all the required methods and support run-time checking of whether an object is the required class.

Private construction can then be implemented by only making the ABC part of your public API and give the subclass that implements the ABC a private name like this:

from abc import ABC, abstractproperty

class MyClass(ABC):
    @abstractproperty
    def foo(self) -> str:
        ...

class _MyClassImpl(MyClass):
    def __init__(self, foo: str) -> None:
        self._foo = foo

    @property
    def foo(self) -> str:
        return self._foo

def make_myclass(foo: str) -> MyClass:
    return _MyClassImpl(foo)

The run-time check that the class has been implemented correctly may be a useful safeguard, but it introduces additional run-time overhead that may not be acceptable.

typing.Protocol is a newer alternative that supports static (or optionally run-time) duck-type checking of an object's behaviour without inheritance.

As a Protocol it would look like this:

from typing import Protocol

class MyClass(Protocol):
    foo: str

class _MyClassImpl:
    foo: str

    def __init__(self, foo: str) -> None:
        self.foo = foo

def make_myclass(foo: str) -> MyClass:
    return _MyClassImpl(foo)

Conclusion

I hope this advice has been interesting and helps you learn a new technique that you may find applicable in other circumstances.

If you are unfortunate enough to only be able to use an old version of Python and unable to use third-party modules then NoReturn is equivalent to Never and can be used instead. ↩
Strictly you could use MyClass(_never=cast(Never, object())) to force it to pass type checking since it's not foolproof, but you would have to be a mighty fool indeed to do it and it'll get caught at runtime. ↩

Resisting Hyrum's Law with Private Constructors in Python

Why Might You Want to Make the Constructor Private?

How Do I Make My Constructor Uncallable Then?

So How Do We Prevent the Object Being Created?

Private Constructors With Private Tokens

How Can I Create Objects Without Using `init`?

Getting Tooling to Warn

What Are My Alternatives if This Doesn't Work for Me?

Conclusion

Other Content

Get in touch to find out how Codethink can help you

connect@codethink.co.uk +44 161 660 9930

Resisting Hyrum's Law with Private Constructors in Python

Why Might You Want to Make the Constructor Private?

How Do I Make My Constructor Uncallable Then?

So How Do We Prevent the Object Being Created?

Private Constructors With Private Tokens

How Can I Create Objects Without Using __init__?

Getting Tooling to Warn

What Are My Alternatives if This Doesn't Work for Me?

Conclusion

Other Content

Get in touch to find out how Codethink can help you

connect@codethink.co.uk +44 161 660 9930

How Can I Create Objects Without Using `init`?