For LEGO collectors

Cataloguing a LEGO collection without the data entry

LEGO sets carry a lot of identifying detail. Most of it is already public. Here is how to keep a useful catalogue of a hundred sets without typing the same thing into a spreadsheet a hundred times.

Josh18 May 2026

You buy a LEGO set. The box has the official name on it, the set number in the bottom corner, the year on the back, the piece count along the side, sometimes the minifigure count too. The theme is on the lid in the same font LEGO has used for a decade. You can read all of it without putting the box down.

So why is everyone you ask about cataloguing their collection typing all of that into a spreadsheet by hand?

The answer is usually that they tried a tool that asked for the set number, fetched the rest, and then sent that set number to a server somewhere. They didn't love that — a collection of sets is not exactly state-secret information, but a complete inventory of what's in your house, with prices and where each set lives, isn't something most people want to ship to a stranger's database. So they fell back to the spreadsheet. The spreadsheet doesn't know what 75192 is and never asks anyone. It also doesn't know what 75192 is. So they type "Millennium Falcon" and "Star Wars" and "2017" and "7541" pieces and "10" minifigures into the row themselves, every time, for every set.

There is a better answer, and it doesn't involve a server.

What the set number already tells you

Every LEGO set is identified by a short number — 75192, 10497, 31203 — that LEGO has been using since the company started organising its own products. The community-run Rebrickable database catalogues every set LEGO has ever made: the name, the theme, the year, the number of pieces, the number of minifigures, an official product image. It's free, it's open, and the whole thing is downloadable as a CSV dump.

That's the trick. If the database is downloadable, you don't need to hit a server every time you type a set number. You download it once, look up set numbers locally, and the lookup never leaves your computer. The set number you typed — already public information printed on the box — doesn't go anywhere either. Your catalogue stays on your machine. Your collection stays yours.

A catalogue built this way is faster to keep up to date than a spreadsheet, because typing a five-digit number and tabbing to the next field is faster than typing "Millennium Falcon, Star Wars, 2017, Ultimate Collector Series, 7541 pieces, 10 minifigures, retired." It's also more accurate, because Rebrickable's data has been corrected by tens of thousands of people over twenty years and your typing has been corrected by you, between sips of coffee, at midnight.

What a real LEGO catalogue tracks

Identifying the set is the easy half. The harder half is everything that's specific to your copy of it.

The state of the box matters. Is it sealed in shrinkwrap, sitting in your closet as a future investment? Is it built and on a shelf? Half-built on the dining table because you started it last weekend? Parted out into bricks and folded into your loose-parts bin? These are real, persistent distinctions, and a serious catalogue records which one applies to each set.

Condition is its own axis. A built set is "complete" or "missing pieces"; a sealed box is "mint" or "near mint" or "light wear" depending on whether the corners are crisp and the shrinkwrap intact. The instruction manual lives or dies on its own scale — water rings, creases, a missing cover. Box and manual are paper goods that age differently from plastic bricks, so the catalogue should treat them separately.

Display details matter to people who display sets. Whether you've installed an LED kit. Whether the set lives on a shelf, in a custom case, in storage, in the loft. Knowing "where is set 75192 right now" is the difference between a catalogue and a list.

And then the boring-but-essential side: when you bought it, who from, how much you paid, what it's worth now. The kind of thing that's invisible until your insurer or a future buyer asks, and then it's the only thing that matters.

That's the shape of the catalogue you want. Most of it is manual entry. Most of that manual entry is one-time-per-set, and stays put once you've typed it.

The reference database changes the work

Once you accept that all of the public identifying detail is a download away, the actual work of cataloguing collapses to:

Type the set number.
Confirm the auto-filled fields are right (they almost always are; Rebrickable's data is good).
Type the things only you know: what state it's in, where it lives, what you paid for it, what you think it's worth.

That's it. Three minutes per set if you're being careful, often less. A hundred sets become a couple of evenings instead of a couple of weeks.

A few details that catch new cataloguers out the first time they try this:

Set numbers sometimes have a dash. Older sets show up in databases as "75192-1" — the "-1" disambiguates re-releases of the same set design with minor packaging changes. Most of the time the bare number works (the lookup tolerates either), but if you have an old set and the lookup comes back empty, try the dash form.

Variants are not the same as re-releases. A San Diego Comic-Con exclusive minifigure version of a set, a UK-only retailer-exclusive box art, an early-run version with a misprint on a tile — these aren't separate set numbers, but they matter for valuation. A serious catalogue has a free-text "variant" field for the version-specific note. The set number identifies the design; the variant note identifies your copy.

Photos are how you prove condition. If the catalogue is for insurance, the set number plus the auto-filled fields prove the set exists; a photo proves the condition was what you said it was. Always photograph sealed boxes face-on and at one corner, built sets from front and side, and the box separately if you kept it. The reference image from Rebrickable is for recognising the set; your photos are for proving the state of yours.

What this looks like in ClearBench

ClearBench ships a LEGO collection type that wires the Rebrickable lookup directly into the new-item flow. You type the set number into the first field of the form, the lookup fires the moment you tab out, and the set name, theme, year, piece count, and minifigure count fill in from your local copy of the database. None of that round-trips through the internet — the database file lives on your computer once you download it, and the lookup is just a query against that file.

The schema covers everything above out of the box. The build-state field includes Built, Sealed, Partial, Parted out, and On display. Box and manual carry their own condition scales separate from the set itself. There's a flag for whether you've installed an LED kit. The reference image from Rebrickable is fetched once per set on demand and cached, so you only ever fetch it the first time you want to see it and only for the sets you actually look at — your library doesn't pre-cache every image of every set you've ever owned.

The catalogue itself stays on your computer. There's no account, no server-side database, no cross-device sync that ships your inventory anywhere. The only times anything leaves your machine are the one-off reference-database download, your explicit fetches of set images, and the licence check when you launch the app. The set numbers themselves are public information; the catalogue you're building from them isn't, and it never goes anywhere.

The mental model the tool encourages is that the lookup gives you what's true about the set, and you provide what's true about your copy. The split is unambiguous, the workload is small, and the catalogue stays accurate as your collection grows — because the part that grows is the part you'd have typed by hand anyway, and the part you would have skimped on (was it 7541 pieces or 7491?) just shows up correctly without you.

If you've been putting off cataloguing your collection because it felt like a lot of typing — it is, but most of the typing isn't yours.