Data extraction: Caching and performance

Data extraction performance comes from doing less work: skip metadata when you do not need it, avoid browser rendering when plain HTML is enough, and let the cache absorb repeated requests.

Extraction-specific speedups

The fastest successful extraction is a focused rule with no extra metadata and no browser render:

The following examples show how to use the Microlink API with CLI, cURL, JavaScript, Python, Ruby, PHP & Golang, targeting 'https://example.com' URL with 'data', 'meta' & 'prerender' API parameters:

CLI Microlink API example

microlink https://example.com&data.title.selector=h1&data.title.attr=text

cURL Microlink API example

curl -G "https://api.microlink.io" \
  -d "url=https://example.com" \
  -d "data.title.selector=h1" \
  -d "data.title.attr=text" \
  -d "meta=false" \
  -d "prerender=false"

JavaScript Microlink API example

import mql from '@microlink/mql'

const { data } = await mql('https://example.com', {
  data: {
    title: {
      selector: "h1",
      attr: "text"
    }
  },
  meta: false,
  prerender: false
})

Python Microlink API example

import requests

url = "https://api.microlink.io/"

querystring = {
    "url": "https://example.com",
    "data.title.selector": "h1",
    "data.title.attr": "text",
    "meta": "false",
    "prerender": "false"
}

response = requests.get(url, params=querystring)

print(response.json())

Ruby Microlink API example

require 'uri'
require 'net/http'

base_url = "https://api.microlink.io/"

params = {
  url: "https://example.com",
  data.title.selector: "h1",
  data.title.attr: "text",
  meta: "false",
  prerender: "false"
}

uri = URI(base_url)
uri.query = URI.encode_www_form(params)

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Get.new(uri)
response = http.request(request)

puts response.body

PHP Microlink API example

<?php

$baseUrl = "https://api.microlink.io/";

$params = [
    "url" => "https://example.com",
    "data.title.selector" => "h1",
    "data.title.attr" => "text",
    "meta" => "false",
    "prerender" => "false"
];

$query = http_build_query($params);
$url = $baseUrl . '?' . $query;

$curl = curl_init();

curl_setopt_array($curl, [
    CURLOPT_URL => $url,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => "",
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 30,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => "GET"
]);

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
    echo "cURL Error #: " . $err;
} else {
    echo $response;
}

Golang Microlink API example

package main

import (
    "fmt"
    "net/http"
    "net/url"
    "io"
)

func main() {
    baseURL := "https://api.microlink.io"

    u, err := url.Parse(baseURL)
    if err != nil {
        panic(err)
    }
    q := u.Query()
    q.Set("url", "https://example.com")
    q.Set("data.title.selector", "h1")
    q.Set("data.title.attr", "text")
    q.Set("meta", "false")
    q.Set("prerender", "false")
    u.RawQuery = q.Encode()

    req, err := http.NewRequest("GET", u.String(), nil)
    if err != nil {
        panic(err)
    }

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        panic(err)
    }

    fmt.Println(string(body))
}

import mql from '@microlink/mql'

const { data } = await mql('https://example.com', {
  data: {
    title: {
      selector: "h1",
      attr: "text"
    }
  },
  meta: false,
  prerender: false
})

Use meta: false and prerender: false whenever the page already ships the content in HTML.

The most effective extraction-specific optimizations are:

Set meta: false when you only need custom fields.
Use prerender: false for static or server-rendered pages.
Prefer waitForSelector over waitForTimeout.
Keep selectors focused instead of extracting huge chunks of DOM.
Disable javascript when the page does not need client-side execution.
Avoid scripts, modules, and function unless the page truly needs them.

Cache strategy

For the cache controls that apply to all workflows — ttl, staleTtl, force, and how to verify caching through response headers — see caching patterns.

A recommended production setup for repeated extractions:

The following examples show how to use the Microlink API with CLI, cURL, JavaScript, Python, Ruby, PHP & Golang, targeting 'https://example.com' URL with 'data', 'meta', 'ttl' & 'staleTtl' API parameters: