No description
Find a file
Illia Bahlai 3b0a7d70e2 Update README with YouTube support documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 11:07:47 +01:00
handlers Add YouTube video support with yt-dlp 2026-01-29 11:02:19 +01:00
services Add YouTube video support with yt-dlp 2026-01-29 11:02:19 +01:00
.env.example Add .env.example with all configuration options 2026-01-28 21:48:42 +01:00
.gitignore init commit 2026-01-28 21:45:06 +01:00
CLAUDE.md Add Obsidian Live Sync integration with proper CouchDB format 2026-01-28 21:45:06 +01:00
config.py Add admin users restriction from .env 2026-01-28 21:48:02 +01:00
Dockerfile Add ffmpeg to Dockerfile for audio/video processing 2026-01-28 22:16:22 +01:00
LICENSE Initial commit 2026-01-28 20:40:27 +00:00
main.py Add Obsidian Live Sync integration with proper CouchDB format 2026-01-28 21:45:06 +01:00
pyproject.toml Add YouTube video support with yt-dlp 2026-01-29 11:02:19 +01:00
README.md Update README with YouTube support documentation 2026-01-29 11:07:47 +01:00
uv.lock Add YouTube video support with yt-dlp 2026-01-29 11:02:19 +01:00

Obsidian Vault Bot

Telegram bot that saves content (URLs, documents, images) to Obsidian vault via Live Sync (CouchDB). Uses Claude AI to automatically classify content, suggest folders, and generate tags.

Features

  • URL Processing: Extract content from web pages and save as markdown notes
  • YouTube Videos: Extract metadata, description, and transcripts from YouTube videos
  • Image Analysis: AI-powered image description and classification
  • Document Processing: Extract text from PDFs, DOCX, and other formats
  • AI Classification: Automatic folder selection, tagging, and summarization
  • Obsidian Live Sync: Direct integration with CouchDB for real-time sync

Installation

# Clone the repository
git clone https://github.com/yourusername/obsidian-vault-bot.git
cd obsidian-vault-bot

# Install dependencies (requires uv package manager)
uv sync

Configuration

Create a .env file with the following variables:

# Telegram
BOT_TOKEN=your_telegram_bot_token

# Admin users (comma-separated Telegram user IDs)
# Leave empty to allow all users
ADMIN_USERS=123456789,987654321

# Anthropic (for AI classification)
ANTHROPIC_API_KEY=your_anthropic_api_key

# CouchDB (Obsidian Live Sync)
COUCHDB_URL=https://your-couchdb-server.com
COUCHDB_USER=your_username
COUCHDB_PASSWORD=your_password
COUCHDB_DATABASE=obsidian

# Optional: Custom folders and tags
PREDEFINED_FOLDERS=["Inbox", "Articles", "Videos", "Documents", "Images"]
PREDEFINED_TAGS=["reference", "tutorial", "news", "research", "personal"]

Tip: To get your Telegram user ID, send a message to @userinfobot

Usage

# Run the bot
python main.py

Telegram Commands

  • /add <url> - Save content from URL
  • /add with image - Save image with AI description
  • /add with document - Save document with extracted content
  • Reply to message with /add - Save message content

Examples

/add https://example.com/article
/add https://www.youtube.com/watch?v=VIDEO_ID
/add https://youtu.be/VIDEO_ID

Send an image or document and reply with /add to save it with AI-generated description.

File Structure

When saving content, the bot creates:

{Folder}/
  raw_files/
    {filename}.jpg       # Original image/document
  {Title}.md             # Markdown note with content

The markdown note includes:

  • YAML frontmatter with tags and source URL
  • Link to original file (for images/documents)
  • AI-generated summary
  • Extracted content

YouTube Videos

YouTube videos are saved with:

  • Video metadata (channel, duration, views)
  • Full description
  • Transcript/subtitles (when available in en/ru/uk/pl)

Architecture

main.py                    # Entry point, dispatcher setup
config.py                  # Environment config
handlers/
  add_handler.py           # /add command handler
services/
  content_processor.py     # URL/document extraction via markitdown
  classifier.py            # Claude AI classification agent
  couchdb_storage.py       # Obsidian Live Sync CouchDB integration

CouchDB Document Format

The bot stores documents in Obsidian Live Sync format:

Metadata document:

{
  "_id": "folder/filename.md",
  "path": "Folder/Filename.md",
  "type": "plain",
  "children": ["h:chunk1", "h:chunk2"],
  "size": 1234,
  "ctime": 1234567890000,
  "mtime": 1234567890000,
  "eden": {}
}

Chunk document:

{
  "_id": "h:randomid",
  "type": "leaf",
  "data": "chunk content"
}

For binary files (images, PDFs):

  • type: "newnote"
  • Each chunk is independently base64 encoded

Requirements

  • Python 3.12+
  • uv package manager
  • ffmpeg (for audio/video processing)
  • CouchDB server with Obsidian Live Sync
  • Telegram Bot Token
  • Anthropic API Key

License

MIT