Om Prakash

Posted on May 12 • Originally published at pixelapi.dev

Remove text and watermarks from any image — one API call

#api #computervision #imageprocessing #devtools

Remove text and watermarks from any image — one API call

Most image-cleanup APIs make you do half the work yourself. You draw a mask, you upload the mask, you cross your fingers. We got tired of that. POST /v1/image/remove-text finds the text for you, erases it, and hands back a clean image. One call. One URL. Done.

What it does

remove-text is an auto-detect-and-erase pipeline for visible text in images. Point it at any public image URL and it segments out the text regions on its own — watermarks, captions, signage, burned-in timestamps, the corner-of-the-frame copyright text — then inpaints what was underneath using the surrounding image context. The output is the same image, minus the text, with the rest of the scene left intact.

The endpoint is POST /v1/image/remove-text. The request is JSON. The response is the cleaned image. There is nothing else to configure for the default path.

If you want finer control, three fields are exposed:

image_url — public URL of the source image. Required.
regions — an optional list of bounding boxes. Pass these if you want removal restricted to specific parts of the image instead of the whole frame. Useful when there is legitimate text you want to keep (a sign behind the subject, a label on a product) and text you want gone (a watermark over the foreground).
preserve_layout — boolean, defaults to true. With it on, the surrounding objects, edges, and structure stay where they are; the inpainting only fills the text region and blends to the local context. Turn it off only if you specifically want a freer regeneration of the masked area.

That is the whole surface. No mask uploads. No multi-step flow where you call a detection endpoint, post-process the boxes, and then call a separate inpainting endpoint. The detection and the inpainting happen on our side, in one round trip.

A few things worth being explicit about, since the FACTS block keeps me honest:

It targets visible text. Watermarks, captions, signage, timestamps — the kinds of overlays that appear in pixels, not metadata.
It works on arbitrary backgrounds. The inpainting uses surrounding context, so it handles sky, skin, fabric, asphalt, foliage — whatever happens to be behind the text.
It is built so you do not have to author the mask. That is the whole point of the detection step happening on our side.

If you have used image-cleanup tooling before, you will notice the request body is almost empty. That is deliberate.

Why we built it

Every team that ships images at scale ends up needing this. Stock-photo workflows. Archive digitization. Re-use of legacy creative. Security-footage ingestion. Marketplaces scraping seller-supplied photos. The pattern is always the same: somebody, somewhere in the pipeline, baked text into the pixels, and now you need it gone before the image moves to the next stage.

The existing options are not great. The traditional path is: run a detector, get bounding boxes, draw a mask, post the mask plus the original image to an inpainting endpoint, hope the seams match. That is two or three services, two or three round trips, and a glue layer you now own. Most rival APIs expose only the inpainting half and expect you to bring the mask. Which is fine if you are a Photoshop user doing one image. It is a problem if you are a backend.

Our angle is simple: detect and remove in one call. We run our own segmentation step on the image, build the mask from the detected text regions, and then inpaint the masked area with surrounding context — all server-side, all in the same request. You do not see the mask. You do not need to see the mask. You get the finished image back.

A few design choices fall out of that:

No mask upload path. We deliberately did not ship a "bring your own mask" mode at launch. The point of the API is that you do not need one. If you have very specific requirements about where removal can happen, that is what regions is for — coarse bounding boxes are enough to constrain the work, and far easier to author than a pixel-precise mask.
preserve_layout defaults to on. The most common failure mode of generative inpainting is that it cheerfully invents new objects in the cleared area. For text removal specifically, you almost never want that — you want the same scene, minus the text. So that is the default behavior, not an opt-in.
Single endpoint, single response. No job IDs, no polling, no callback URLs for the default case. You call it, you get the image. We host the infrastructure so you do not have to figure out batch queues for what is, conceptually, a one-shot transform.

The thing we want to make boring is the part that is usually annoying: getting a clean image out of a dirty one.

Quickstart

The minimal call is one curl. Drop your API key in, point image_url at any reachable image, and you are done.

curl -X POST https://api.pixelapi.dev/v1/image/remove-text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg"}'

Same thing in Python using requests:

import os
import requests

API_KEY = os.environ["PIXELAPI_KEY"]

resp = requests.post(
    "https://api.pixelapi.dev/v1/image/remove-text",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/source.jpg",
    },
    timeout=60,
)
resp.raise_for_status()
result = resp.json()
print(result)

That covers the default case: full-image scan, layout preserved, text gone.

If you want to scope the removal — say, you only want to clean the bottom-right corner where the timestamp lives, and you want to leave the rest of the image untouched even if there happens to be incidental text in it — pass regions:

resp = requests.post(
    "https://api.pixelapi.dev/v1/image/remove-text",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "image_url": "https://example.com/cam-frame.jpg",
        "regions": [
            {"x": 1500, "y": 1000, "width": 380, "height": 60}
        ],
        "preserve_layout": True,
    },
    timeout=60,
)

And that is the API. There is no step three.

Use cases

Cleaning up timestamps burned into security-camera frames

If you operate any kind of CCTV or DVR pipeline, you know the problem. The camera firmware burns a timestamp directly into every frame — usually white text in a corner, sometimes with a black background block, sometimes not. The metadata is also in the file, so the burned-in copy is redundant. But the moment you want to do anything downstream — train a model on the frames, use the footage in an internal incident report, hand a snapshot to a customer — that timestamp is sitting there in the pixels and it cannot be turned off retroactively. Running each frame through remove-text with a regions box around the corner where the timestamp lives gives you a clean frame, with the rest of the scene unchanged. Wire it into your ingest path and the burned-in copy never leaves your pipeline. The metadata stays where it belongs — in the file headers — and the visual frame is yours to use.

Stripping captions off stock-photo previews you've licensed

Stock-photo workflows are full of friction. You browse, you license, you download the high-resolution version — and the licensed copy is supposed to come without the preview watermark. In practice, half the asset systems we have talked to end up with watermarked previews mixed into their working folders, either because someone grabbed a comp earlier in the process, or because a partner sent a reference file by mistake, or because the asset got cached at the preview stage and the clean version never replaced it. For images you have legitimately licensed, remove-text lets you reclaim those preview copies without re-downloading from the source. Detection handles the irregular placement of caption strips — corner, diagonal, full-frame tile — and the inpainting fills the area using whatever is around it. You end up with a usable working file from an asset you already paid for.

Erasing brand markings before re-using your own product imagery in a new campaign

Every brand team eventually hits this: a beautiful product photo from a previous campaign, where the photographer (or the agency, or the in-house designer) baked the campaign tagline or the old SKU label into the corner. Now you want to re-use the shot for a new campaign, and that old text is a non-starter. The traditional fix is a designer round-trip — open the file, clone-stamp the area, retouch the edges, save out, version, ship. For one image, fine. For two hundred, painful. remove-text handles the cleanup programmatically: point it at the originals, get back versions with the legacy markings gone, and let your designers spend their time on the new creative instead of erasing the old. With preserve_layout on, the product itself stays exactly where it was — only the text disappears.

Pricing

Pricing is per-call, flat, no tiers to negotiate.

Credits per call: 16
Price in INR: ₹0.011 per call
Price in USD: $0.00013 per call

That is the cleared-image, end-to-end price. Detection plus inpainting. No separate charge for the segmentation step. No surcharge for using the regions field. If the call succeeds, it costs 16 credits; if it fails, it does not.

At those numbers, a hundred thousand images is around ₹1,100 / $13. Most teams discover that the cost of running this in production is dwarfed by the cost of not running it — the manual cleanup hours, the back-and-forth with vendors over watermarked deliverables, the asset-management tickets that pile up because someone has to find a designer to redo a corner of a photo.

Try it

Sign in at https://pixelapi.dev/dashboard to get your API key. New accounts come with starter credits, so you can hit the endpoint with one of your own images before deciding anything.

Full reference for the request body, error codes, response format, and the regions and preserve_layout fields lives in the docs at https://pixelapi.dev/docs.

If you ship images at any kind of scale, give remove-text a real workload — a folder of timestamps, a batch of legacy product shots, a directory of licensed previews — and see what comes back. The whole point of this endpoint is that there is nothing else to learn. One URL in, one clean image out.

DEV Community

Remove text and watermarks from any image — one API call

Remove text and watermarks from any image — one API call

What it does

Why we built it

Quickstart

Use cases

Cleaning up timestamps burned into security-camera frames

Stripping captions off stock-photo previews you've licensed

Erasing brand markings before re-using your own product imagery in a new campaign

Pricing

Try it

Top comments (0)