Live demo
Precision @ 10
0.154
Lift
4.45×
Catalogue
29,637
Build cost
< HK$500

Tag · Search · Recommend

Twenty-nine thousand toys — tagged, searched, recommended.

A vision-language model reads each product's image and title and emits hashtags. The 29,466 raw tags collapse into 4,609 canonical labels that anyone can search. A 64-dimensional autoencoder then re-ranks the matches to produce the top-10 recommendations — a 4.45× lift over the text-only baseline.

§ I

How we tag

One product, one pass. The VLM reads the image + title and emits hashtags; clustering collapses them into canonical labels.

Query console

§ II

Pick a query

§ IV

Recommended toys

Hybrid retrieval — the canonical-tag filter narrows 29,637 products to a shortlist; the 64-d embedding re-ranks the top ten.