Markdown

Markdown extraction is a specialized form of Microlink data extraction. You still declare a data field, but the rule uses attr: 'markdown' so Microlink serializes HTML into Markdown instead of returning raw text or HTML.

The following examples show how to use the Microlink API with CLI, cURL, JavaScript, Python, Ruby, PHP & Golang, targeting 'https://microlink.io/docs/api/getting-started/overview' URL with 'data' & 'meta' API parameters:

CLI Microlink API example

microlink https://microlink.io/docs/api/getting-started/overview&data.article.selector=main&data.article.attr=markdown

cURL Microlink API example

curl -G "https://api.microlink.io" \
  -d "url=https://microlink.io/docs/api/getting-started/overview" \
  -d "data.article.selector=main" \
  -d "data.article.attr=markdown" \
  -d "meta=false"

JavaScript Microlink API example

import mql from '@microlink/mql'

const { data } = await mql('https://microlink.io/docs/api/getting-started/overview', {
  data: {
    article: {
      selector: "main",
      attr: "markdown"
    }
  },
  meta: false
})

Python Microlink API example

import requests

url = "https://api.microlink.io/"

querystring = {
    "url": "https://microlink.io/docs/api/getting-started/overview",
    "data.article.selector": "main",
    "data.article.attr": "markdown",
    "meta": "false"
}

response = requests.get(url, params=querystring)

print(response.json())

Ruby Microlink API example

require 'uri'
require 'net/http'

base_url = "https://api.microlink.io/"

params = {
  url: "https://microlink.io/docs/api/getting-started/overview",
  data.article.selector: "main",
  data.article.attr: "markdown",
  meta: "false"
}

uri = URI(base_url)
uri.query = URI.encode_www_form(params)

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Get.new(uri)
response = http.request(request)

puts response.body

PHP Microlink API example

<?php

$baseUrl = "https://api.microlink.io/";

$params = [
    "url" => "https://microlink.io/docs/api/getting-started/overview",
    "data.article.selector" => "main",
    "data.article.attr" => "markdown",
    "meta" => "false"
];

$query = http_build_query($params);
$url = $baseUrl . '?' . $query;

$curl = curl_init();

curl_setopt_array($curl, [
    CURLOPT_URL => $url,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => "",
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 30,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => "GET"
]);

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
    echo "cURL Error #: " . $err;
} else {
    echo $response;
}

Golang Microlink API example

package main

import (
    "fmt"
    "net/http"
    "net/url"
    "io"
)

func main() {
    baseURL := "https://api.microlink.io"

    u, err := url.Parse(baseURL)
    if err != nil {
        panic(err)
    }
    q := u.Query()
    q.Set("url", "https://microlink.io/docs/api/getting-started/overview")
    q.Set("data.article.selector", "main")
    q.Set("data.article.attr", "markdown")
    q.Set("meta", "false")
    u.RawQuery = q.Encode()

    req, err := http.NewRequest("GET", u.String(), nil)
    if err != nil {
        panic(err)
    }

    client := &http.Client{}
    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        panic(err)
    }

    fmt.Println(string(body))
}

import mql from '@microlink/mql'

const { data } = await mql('https://microlink.io/docs/api/getting-started/overview', {
  data: {
    article: {
      selector: "main",
      attr: "markdown"
    }
  },
  meta: false
})

Run the request and inspect data.article. The field name is yours, and attr: 'markdown' controls the serialization format.

MQL installation

To run the JavaScript examples with MQL, install @microlink/mql:

npm install @microlink/mql --save

It works in Node.js, Edge runtimes, and the browser. See the MQL installation guide for the environment-specific setup.

If you are using another language, you do not need to install MQL to follow this guide. You can use the terminal examples or call the API directly from any HTTP client.

The mental model

The same data rules from the Data extraction guide apply here:

The field name you declare becomes the response key.
attr: 'markdown' converts the matched HTML into Markdown.
Omit selector to convert the whole page.
Add selector when you only want the main content wrapper.

{
  url: 'https://microlink.io/docs/api/getting-started/overview',
  data: {
    article: {
      selector: 'main',
      attr: 'markdown'
    }
  },
  meta: false
}

The response

The converted Markdown appears under the field you declared:

{
  "status": "success",
  "data": {
    "article": "# Overview\n\nMicrolink API lets you..."
  }
}

When meta is true (the default), the Markdown output includes a YAML frontmatter header with normalized metadata — title, description, url, author, publisher, image, logo, date, word_count, reading_time, and more. This gives LLMs richer context about the source page. Set meta: false to return only the raw Markdown content. See Include metadata for richer context for examples and details.

Choose a starting pattern

Need	Best pattern	Why
Convert the whole document	Omit `selector` and use `attr: 'markdown'`	Fastest way to prototype or feed a full page into another system
Keep only the main article or docs body	Add `selector: 'main'` or `selector: 'article'`	Avoid nav, footer, cookie banners, and other chrome
Include page metadata for LLM context	Keep `meta: true` (the default)	Adds a YAML frontmatter header with title, description, author, dates, and more
Return Markdown plus a few supporting fields	Mix Markdown with other `data` rules	Useful for indexing, CMS imports, and LLM pipelines
Return the Markdown body directly	Keep the field in `data`, then use `embed`	Turns the API URL into a direct Markdown response

Use this guide for markdown-specific decisions

This guide stays focused on the choices that are unique to Markdown extraction:

whole page vs scoped content
metadata frontmatter vs raw content only
clean conversion vs noisy page chrome
JSON vs direct Markdown delivery

When you need the broader rule system, jump to the detailed Data extraction pages:

Defining rules for nested objects, collections, typed fields, fallbacks, and evaluate.
Page preparation for the full rendering, waiting, device, and browser-automation toolbox.
Caching and performance for deeper cache tuning with ttl, staleTtl, and force.
Private pages and Troubleshooting for the shared auth, proxy, and debugging model.

Free tier and advanced features

The examples here work on the free tier. Custom headers, proxy, and configurable cache remain features, exactly as in the Data extraction guide.

See the authentication and rate limit docs for the plan details.

What's next

This guide is intentionally small. Pick the next step based on what you need:

Choosing scope for choosing the right wrapper, preparing the page state, and fixing noisy or incomplete Markdown.
Delivery and response shaping for JSON vs direct Markdown responses, performance defaults, and safe private-page handling.
Data extraction when you want the full shared MQL workflow beyond Markdown-specific decisions.