Fixing a Silent Notion Sync Failure in Dify
This post covers my merged contribution to Dify, a popular open-source LLM application platform. The fix addressed a critical regression that silently broke Notion knowledge-base synchronization for all self-hosted v1.13.0 users.
Primary References
- Issue: langgenius/dify#32705
- Pull Request: langgenius/dify#32747
- Regression source: langgenius/dify#32129 (DB session refactor)
- Main source file: document_indexing_sync_task.py
Bug Flow Overview
Background
Dify allows users to connect external knowledge sources — including Notion — as retrieval-augmented context for LLM applications. When a Notion page is modified, users click "Sync" in the Dify dashboard to pull the latest content into their knowledge base.
After upgrading to v1.13.0, self-hosted users reported that the sync button appeared to finish instantly, but the content was never updated. No user-facing error was shown — the failure was completely silent. (See Issue #32705 for the original bug report.)
The Symptom
In the Docker worker logs, the real error was buried:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'dict'
This told me that somewhere in the sync task, a Python dict was being written directly to a PostgreSQL text column — and psycopg2 does not know how to serialize a dict to a text field.
Root Cause Analysis
The bug was a classic serialization regression. In the document_indexing_sync_task, after detecting that a Notion page had changed, the task reads metadata, updates a timestamp, and writes it back:
# The broken code path
data_source_info = document.data_source_info_dict # returns a Python dict
data_source_info["last_edited_time"] = last_edited_time
document.data_source_info = data_source_info # ← raw dict to LongText column
The data_source_info column is a LongText field in the database. It stores JSON as a plain string, not as a native JSON type. Assigning a Python dict directly to this column causes psycopg2 to reject it at commit time.
This regression was introduced in PR #32129, which refactored the sync task to use split database sessions. During the refactor, the original json.dumps() call was accidentally dropped. You can see the full diff of my fix here.
The Masking Test Fixture
What made this bug especially interesting was that the existing integration tests did not catch it. Why?
The test suite included an autouse fixture that globally registered a psycopg2 adapter to convert dict objects to JSON:
# This fixture was HIDING the bug
@pytest.fixture(autouse=True)
def _register_dict_adapter_for_psycopg2():
"""Align test DB adapter behavior with dict payloads used in task update flow."""
register_adapter(dict, Json)
With this fixture active, psycopg2 could silently accept a raw dict — so the tests passed even though the production code path would fail. The fixture was essentially a workaround that masked the real serialization contract violation.
The Fix
The code fix itself was a single line:
# Before
document.data_source_info = data_source_info
# After
document.data_source_info = json.dumps(data_source_info)
This is consistent with every other data_source_info write in the entire Dify codebase. The regression was simply a missed serialization call during refactoring.
Test Changes
Beyond the one-line fix, I made two important test changes:
1. Removed the masking fixture
I deleted the _register_dict_adapter_for_psycopg2 autouse fixture from the integration tests. This ensures that if a similar regression is introduced in the future, the tests will catch it immediately rather than silently adapting around it.
2. Added a regression unit test
I added TestDataSourceInfoSerialization with a test that exercises the full sync flow and explicitly asserts that data_source_info is a JSON string, not a dict:
def test_data_source_info_serialized_as_json_string(self, ...):
"""data_source_info must be serialized with json.dumps before DB write."""
# ... setup mocks for the full sync flow ...
document_indexing_sync_task(dataset_id, document_id)
# Assert: must be a JSON string, not a dict
assert isinstance(mock_document.data_source_info, str)
parsed = json.loads(mock_document.data_source_info)
assert parsed["last_edited_time"] == "2024-02-01T00:00:00Z"
Validation
# Lint and format checks
ruff check
ruff format --check
# Unit tests (4/4 passed)
pytest api/tests/unit_tests/tasks/test_document_indexing_sync_task.py -v
Review and Merge
The PR was reviewed and approved by @crazywoola, a Dify core maintainer, and merged on March 1, 2026. The patch scope was intentionally small — one functional line, one removed fixture, one added test — to minimize review burden and merge risk.
Lessons Learned
- Test fixtures can hide bugs. An
autousefixture that globally adapts types may keep tests green while production code is broken. Be suspicious of any fixture that modifies runtime adapter behavior. - Serialization boundaries deserve explicit tests. When data crosses from Python objects to database columns, the serialization format should be asserted directly, not just the data content.
- Refactoring regressions are predictable. When code is restructured (like splitting DB sessions), serialization and type-conversion calls at data boundaries are the most likely casualties. These deserve targeted review attention during refactors.
- Silent failures are expensive. This bug produced no user-facing error — the sync just silently did nothing. Adding observability (logging, metrics) around task completion would help surface these failures faster.
Impact
- Users affected: all self-hosted Dify v1.13.0 users with Notion knowledge bases.
- Severity: data sync was completely broken, not degraded.
- Fix scope: minimal and surgical — one line of code, zero risk of side effects.
- Testing improvement: removed a masking fixture and added a direct regression test, making the codebase more honest.
Takeaway
The best open-source contributions often come from following a production error to its root cause, then fixing not just the code but also the testing gap that allowed it to ship. This PR is a good example: a one-line fix paired with a testing cleanup that makes the project more robust going forward.