How to Extract Metadata from Notion Pages and Databases
Notion databases hold structured metadata that many teams rely on for project tracking, content management, and CRM workflows. This guide covers how to extract that data programmatically through the Notion API, handle pagination for large datasets, normalize the nested property format into clean output, and store results in external systems.
Why Extract Metadata from Notion
Notion has crossed 100 million registered users, and a significant portion of them use databases as lightweight project management tools, CRMs, content calendars, and knowledge bases. These databases store structured metadata across 22 property types, from simple text and numbers to relations between databases, rollup aggregations, and computed formulas.
The problem comes when you need that metadata outside Notion. Common scenarios include:
- Syncing project data to a reporting dashboard or data warehouse
- Feeding content metadata into a search index or static site generator
- Migrating structured records to a different platform
- Building automated workflows that react to database changes
- Creating backups that preserve property types and relations
Notion's built-in CSV export strips most of the structure. Relations become plain text. Formulas export as static values. Rollups disappear entirely. Multi-select tags lose their formatting. If you need typed, structured metadata, the API is the only reliable path.
The Notion API returns property values as nested JSON objects with type information preserved. A date stays a date with start and end fields. A relation keeps its linked page IDs. A formula includes both its result and result type. This makes the API output far more useful for downstream systems than any export format Notion offers natively.
What to check before scaling metadata extraction from notion pages databases
Before you can query any database, you need an internal integration with the right permissions. Here is the setup:
- Go to notion.so/my-integrations and click "New integration"
- Name it something descriptive like "Metadata Extractor"
- Select the workspace that contains your target databases
- Under Capabilities, check "Read content" (you do not need write access for extraction)
- Click Submit and copy the Internal Integration Secret
The secret starts with ntn_ and looks something like ntn_v2_abc123.... Store it as an environment variable rather than hardcoding it in your scripts.
The step most people miss: you must explicitly share each database with the integration. Open the database in Notion, click the three-dot menu in the top right, select "Connections," and add your integration by name. Without this step, every API call returns a 404 as if the database does not exist.
Test the connection with a quick curl:
curl https://api.notion.com/v1/databases/YOUR_DATABASE_ID \
-H "Authorization: Bearer $NOTION_API_KEY" \
-H "Notion-Version: 2022-06-28"
A successful response returns the database object with its properties field listing every column and its configuration. If you get a 401 or 404, double-check that the integration token is correct and the database is shared with the connection.
One note on the Notion-Version header: the API requires it on every request. Notion has shipped several new API versions through early 2026, each adding features like multi-source database support and the Views API. Use the latest stable version for new projects, but older versions continue to work.
Querying a Database and Reading Page Properties
With the integration connected, you can query a database to retrieve all its pages and their property values. The core endpoint is:
POST https://api.notion.com/v1/databases/{database_id}/query
Using the official Notion SDK, a minimal query with no filters returns every page in the database:
const { Client } = require("@notionhq/notion-client");
const notion = new Client({ auth: process.env.NOTION_API_KEY });
const response = await notion.databases.query({
database_id: "your-database-id",
});
for (const page of response.results) {
console.log(page.id, page.properties);
}
Each page in response.results includes a properties object where keys are column names and values are typed property objects. A select property looks like this:
{
"Status": {
"id": "abc123",
"type": "select",
"select": {
"id": "option-id",
"name": "In Progress",
"color": "blue"
}
}
}
You can narrow results with filters. Notion supports compound filters using and/or logic:
const response = await notion.databases.query({
database_id: "your-database-id",
filter: {
and: [
{
property: "Status",
select: { equals: "Published" },
},
{
property: "Created",
date: { on_or_after: "2026-01-01" },
},
],
},
sorts: [
{ property: "Created", direction: "descending" },
],
});
The filter_properties parameter lets you request only specific columns, which reduces response size when you do not need every field:
const response = await notion.databases.query({
database_id: "your-database-id",
filter_properties: ["Title", "Status", "DueDate"],
});
Four property types return paginated lists rather than inline values: title, rich_text, relation, and people. The database query endpoint returns up to 25 items inline for these types. If a property has more than 25 values (common with relation properties linking to many pages), you need to fetch the full list separately using the page property endpoint:
GET https://api.notion.com/v1/pages/{page_id}/properties/{property_id}
This endpoint returns a paginated list of property items with its own next_cursor for iteration.
Turn Documents into Queryable Metadata
Fast.io Metadata Views extract structured fields from PDFs, images, and documents using AI. Describe what you need in plain language and get a typed, searchable table. No extraction code required.
Handling Pagination for Large Databases
The Notion API returns a maximum of 100 pages per request. For databases with hundreds or thousands of entries, you need to paginate through the full result set.
Every query response includes two pagination fields:
has_more: a boolean indicating whether more pages exist beyond the current batchnext_cursor: a string token to pass asstart_cursorin your next request
Here is a complete pagination loop that collects all pages from a database:
async function getAllPages(databaseId) {
const pages = [];
let cursor = undefined;
while (true) {
const response = await notion.databases.query({
database_id: databaseId,
start_cursor: cursor,
page_size: 100,
});
pages.push(...response.results);
if (!response.has_more) break;
cursor = response.next_cursor;
}
return pages;
}
A few practical considerations for production use:
Rate limits. Notion enforces a rate limit of roughly 3 requests per second per integration. For a database with 5,000 entries, that means 50 paginated requests taking about 17 seconds at best. Add a small delay between requests to stay under the limit:
const delay = (ms) => new Promise((r) => setTimeout(r, ms));
// Inside the loop, after each request:
await delay(350);
Stale cursors. If the database changes between paginated requests (new pages added, pages deleted), the cursor still works but you might see gaps or duplicates. For consistency-sensitive extractions, track page IDs and deduplicate after the full fetch completes.
Property-level pagination. Remember that title, rich_text, relation, and people properties also paginate independently. If a page has a relation property linking to 50 other pages, the database query only returns the first 25 inline. Fetch the rest with the page property endpoint and its own cursor loop. Missing this step is one of the most common bugs in Notion extraction scripts.
Normalizing Notion Property Types
Raw Notion API responses are deeply nested. Every property wraps its value in a type-specific structure, which makes sense for the API's flexibility but creates friction when you want clean, flat data for a spreadsheet, database table, or JSON export.
Here is a normalizer function that flattens each property type to a usable value:
function normalizeProperty(property) {
switch (property.type) {
case "title":
return property.title.map((t) => t.plain_text).join("");
case "rich_text":
return property.rich_text.map((t) => t.plain_text).join("");
case "number":
return property.number;
case "select":
return property.select?.name ?? null;
case "multi_select":
return property.multi_select.map((s) => s.name);
case "status":
return property.status?.name ?? null;
case "date":
return property.date
? { start: property.date.start, end: property.date.end }
: null;
case "checkbox":
return property.checkbox;
case "url":
return property.url;
case "email":
return property.email;
case "phone_number":
return property.phone_number;
case "formula":
return property.formula[property.formula.type];
case "relation":
return property.relation.map((r) => r.id);
case "rollup":
return property.rollup[property.rollup.type];
case "people":
return property.people.map((p) => p.name || p.id);
case "files":
return property.files.map((f) =>
f.type === "external" ? f.external.url : f.file.url
);
case "created_time":
return property.created_time;
case "last_edited_time":
return property.last_edited_time;
case "created_by":
return property.created_by.name || property.created_by.id;
case "last_edited_by":
return property.last_edited_by.name || property.last_edited_by.id;
case "unique_id":
return property.unique_id.prefix
? `${property.unique_id.prefix}-${property.unique_id.number}`
: property.unique_id.number;
default:
return null;
}
}
Use this to transform an entire page into a flat record:
function normalizePage(page) {
const record = { id: page.id };
for (const [key, prop] of Object.entries(page.properties)) {
record[key] = normalizeProperty(prop);
}
return record;
}
const pages = await getAllPages(databaseId);
const records = pages.map(normalizePage);
The output is a clean array of objects ready for JSON export, database insertion, or API forwarding. A page that started as 50 lines of nested JSON becomes a flat object with string, number, boolean, and array values.
A few edge cases to watch for. Select and status properties can be null when no option is chosen. Files properties contain temporary URLs that expire after an hour for Notion-hosted files, so download them promptly if you need persistent access. Formula results vary in type (string, number, boolean, or date), and the nested type field tells you which one you are dealing with.
Storing Extracted Metadata in External Systems
Once you have normalized records, you need somewhere to put them. The right destination depends on what you plan to do with the data.
Local files work for one-off exports and backups. Write records to JSON for structured access or CSV for spreadsheet compatibility:
const fs = require("fs");
fs.writeFileSync(
"notion-export.json",
JSON.stringify(records, null, 2)
);
For CSV, flatten any array values (like multi-select) into delimited strings before writing. Libraries like csv-stringify handle quoting and escaping edge cases that manual string joining misses.
Relational databases like PostgreSQL or SQLite are better for ongoing sync workflows where you query and update records over time. Map Notion property types to SQL column types: TEXT for titles and rich text, NUMERIC for numbers, BOOLEAN for checkboxes, JSONB for arrays like multi-select and relations. Use the Notion page ID as your primary key so re-running the extraction updates existing rows rather than creating duplicates.
Cloud workspaces with structured extraction are useful when your Notion data references documents that also need metadata pulled from their contents. Fast.io's Metadata Views take a different approach to this problem. Instead of writing extraction code, you describe the fields you want in natural language, and the platform's AI builds a typed schema and populates a sortable, filterable view automatically. This works across PDFs, images, Word documents, spreadsheets, and scanned pages.
For a workflow that bridges both worlds, you could extract structured metadata from Notion databases using the API approach in this guide, then upload the referenced documents to a Fast.io workspace where Metadata Views pull additional fields from the file contents. The Notion record gives you project dates, assignees, and status. The document extraction gives you contract values, clause types, or invoice line items from the actual files.
Search indexes like Elasticsearch or Typesense turn your extracted metadata into a fast, faceted search experience. This is particularly useful when you need full-text search across properties that Notion's built-in search handles inconsistently, especially for large workspaces with thousands of pages.
Whatever destination you choose, build the extraction as an idempotent operation. Track last_edited_time to skip pages that have not changed since the last run, and use the page ID as a unique key. This turns your extraction script into a reliable sync pipeline that you can run on a schedule without worrying about duplicate or stale records.
Frequently Asked Questions
How do you extract data from a Notion database?
Create an internal integration at notion.so/my-integrations, share your database with it, then use the POST /v1/databases/{id}/query endpoint to retrieve pages. Each page includes a properties object with typed values for every column. The official Notion SDK for JavaScript or Python simplifies authentication and pagination handling.
Can you export Notion page metadata via API?
Yes. The GET /v1/pages/{id} endpoint returns a page object with all its properties, including title, dates, selects, relations, and computed fields like formulas and rollups. For properties with more than 25 values (title, rich_text, relation, people), use the dedicated property endpoint at GET /v1/pages/{id}/properties/{property_id} to fetch the complete list.
What metadata fields does the Notion API return?
The API returns all 22 database property types: title, rich_text, number, select, multi_select, status, date, person, files, checkbox, URL, email, phone, formula, relation, rollup, created_time, created_by, last_edited_time, last_edited_by, unique_id, and button. Each property includes its type, ID, and a type-specific value object with the actual data.
How do you bulk export Notion database properties?
Use the database query endpoint with pagination. Set page_size to 100 (the maximum) and loop through results using the next_cursor token until has_more is false. For a 5,000-row database, this takes about 50 API calls. Add a 350ms delay between requests to stay within Notion's rate limit of roughly 3 requests per second.
How do you handle Notion API pagination?
Every query response includes has_more (boolean) and next_cursor (string). Pass next_cursor as start_cursor in your next request to get the next batch. Keep looping until has_more is false. The same pattern applies to property-level pagination when fetching large relation or people properties via the page property endpoint.
Related Resources
Turn Documents into Queryable Metadata
Fast.io Metadata Views extract structured fields from PDFs, images, and documents using AI. Describe what you need in plain language and get a typed, searchable table. No extraction code required.