Fixing a Silent Notion Sync Failure in Dify

This post covers my merged contribution to Dify, a popular open-source LLM application platform. The fix addressed a critical regression that silently broke Notion knowledge-base synchronization for all self-hosted v1.13.0 users.

Primary References

Issue: langgenius/dify#32705
Pull Request: langgenius/dify#32747
Regression source: langgenius/dify#32129 (DB session refactor)
Main source file: document_indexing_sync_task.py

Bug Flow Overview

Dify Notion Sync Bug — data flow diagram showing how a raw dict caused psycopg2 ProgrammingError and how json.dumps fixed it — Top: the broken data path in v1.13.0. Middle: the fixed path after PR #32747. Bottom: why the existing test suite missed it.

Background

Dify allows users to connect external knowledge sources — including Notion — as retrieval-augmented context for LLM applications. When a Notion page is modified, users click "Sync" in the Dify dashboard to pull the latest content into their knowledge base.

After upgrading to v1.13.0, self-hosted users reported that the sync button appeared to finish instantly, but the content was never updated. No user-facing error was shown — the failure was completely silent. (See Issue #32705 for the original bug report.)

The Symptom

In the Docker worker logs, the real error was buried:

sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'dict'

This told me that somewhere in the sync task, a Python dict was being written directly to a PostgreSQL text column — and psycopg2 does not know how to serialize a dict to a text field.

Root Cause Analysis

The bug was a classic serialization regression. In the document_indexing_sync_task, after detecting that a Notion page had changed, the task reads metadata, updates a timestamp, and writes it back:

# The broken code path
data_source_info = document.data_source_info_dict   # returns a Python dict
data_source_info["last_edited_time"] = last_edited_time
document.data_source_info = data_source_info         # ← raw dict to LongText column

The data_source_info column is a LongText field in the database. It stores JSON as a plain string, not as a native JSON type. Assigning a Python dict directly to this column causes psycopg2 to reject it at commit time.

This regression was introduced in PR #32129, which refactored the sync task to use split database sessions. During the refactor, the original json.dumps() call was accidentally dropped. You can see the full diff of my fix here.

The Masking Test Fixture

What made this bug especially interesting was that the existing integration tests did not catch it. Why?

The test suite included an autouse fixture that globally registered a psycopg2 adapter to convert dict objects to JSON:

# This fixture was HIDING the bug
@pytest.fixture(autouse=True)
def _register_dict_adapter_for_psycopg2():
    """Align test DB adapter behavior with dict payloads used in task update flow."""
    register_adapter(dict, Json)

With this fixture active, psycopg2 could silently accept a raw dict — so the tests passed even though the production code path would fail. The fixture was essentially a workaround that masked the real serialization contract violation.

The Fix

The code fix itself was a single line:

# Before
document.data_source_info = data_source_info

# After
document.data_source_info = json.dumps(data_source_info)

This is consistent with every other data_source_info write in the entire Dify codebase. The regression was simply a missed serialization call during refactoring.

Test Changes

Beyond the one-line fix, I made two important test changes:

1. Removed the masking fixture

I deleted the _register_dict_adapter_for_psycopg2 autouse fixture from the integration tests. This ensures that if a similar regression is introduced in the future, the tests will catch it immediately rather than silently adapting around it.

2. Added a regression unit test

I added TestDataSourceInfoSerialization with a test that exercises the full sync flow and explicitly asserts that data_source_info is a JSON string, not a dict:

def test_data_source_info_serialized_as_json_string(self, ...):
    """data_source_info must be serialized with json.dumps before DB write."""
    # ... setup mocks for the full sync flow ...

    document_indexing_sync_task(dataset_id, document_id)

    # Assert: must be a JSON string, not a dict
    assert isinstance(mock_document.data_source_info, str)
    parsed = json.loads(mock_document.data_source_info)
    assert parsed["last_edited_time"] == "2024-02-01T00:00:00Z"

Validation

# Lint and format checks
ruff check
ruff format --check

# Unit tests (4/4 passed)
pytest api/tests/unit_tests/tasks/test_document_indexing_sync_task.py -v

Review and Merge

The PR was reviewed and approved by @crazywoola, a Dify core maintainer, and merged on March 1, 2026. The patch scope was intentionally small — one functional line, one removed fixture, one added test — to minimize review burden and merge risk.

Lessons Learned

Test fixtures can hide bugs. An autouse fixture that globally adapts types may keep tests green while production code is broken. Be suspicious of any fixture that modifies runtime adapter behavior.
Serialization boundaries deserve explicit tests. When data crosses from Python objects to database columns, the serialization format should be asserted directly, not just the data content.
Refactoring regressions are predictable. When code is restructured (like splitting DB sessions), serialization and type-conversion calls at data boundaries are the most likely casualties. These deserve targeted review attention during refactors.
Silent failures are expensive. This bug produced no user-facing error — the sync just silently did nothing. Adding observability (logging, metrics) around task completion would help surface these failures faster.

Impact

Users affected: all self-hosted Dify v1.13.0 users with Notion knowledge bases.
Severity: data sync was completely broken, not degraded.
Fix scope: minimal and surgical — one line of code, zero risk of side effects.
Testing improvement: removed a masking fixture and added a direct regression test, making the codebase more honest.

Takeaway

The best open-source contributions often come from following a production error to its root cause, then fixing not just the code but also the testing gap that allowed it to ship. This PR is a good example: a one-line fix paired with a testing cleanup that makes the project more robust going forward.

Fixing a Silent Notion Sync Failure in Dify

Fixing a Silent Notion Sync Failure in Dify

Primary References

Bug Flow Overview

Background

The Symptom

Root Cause Analysis

The Masking Test Fixture

The Fix

Test Changes

1. Removed the masking fixture

2. Added a regression unit test

Validation

Review and Merge

Lessons Learned

Impact

Takeaway

Weiguang Li

Related Articles

Fixing a Silent Notion Sync Failure in Dify

Fixing a Silent Notion Sync Failure in Dify

Primary References

Bug Flow Overview

Background

The Symptom

Root Cause Analysis

The Masking Test Fixture

The Fix

Test Changes

1. Removed the masking fixture

2. Added a regression unit test

Validation

Review and Merge

Lessons Learned

Impact

Takeaway

Weiguang Li

Related Articles

Fixing Vertex AI Gemini Tool-Schema Interop in LangChain4j

How I Fixed a Core MCP Transport Compatibility Bug in LangChain4j