No description

Find a file

Illia Bahlai 3b0a7d70e2 Update README with YouTube support documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2026-01-29 11:07:47 +01:00
handlers	Add YouTube video support with yt-dlp	2026-01-29 11:02:19 +01:00
services	Add YouTube video support with yt-dlp	2026-01-29 11:02:19 +01:00
.env.example	Add .env.example with all configuration options	2026-01-28 21:48:42 +01:00
.gitignore	init commit	2026-01-28 21:45:06 +01:00
CLAUDE.md	Add Obsidian Live Sync integration with proper CouchDB format	2026-01-28 21:45:06 +01:00
config.py	Add admin users restriction from .env	2026-01-28 21:48:02 +01:00
Dockerfile	Add ffmpeg to Dockerfile for audio/video processing	2026-01-28 22:16:22 +01:00
LICENSE	Initial commit	2026-01-28 20:40:27 +00:00
main.py	Add Obsidian Live Sync integration with proper CouchDB format	2026-01-28 21:45:06 +01:00
pyproject.toml	Add YouTube video support with yt-dlp	2026-01-29 11:02:19 +01:00
README.md	Update README with YouTube support documentation	2026-01-29 11:07:47 +01:00
uv.lock	Add YouTube video support with yt-dlp	2026-01-29 11:02:19 +01:00

README.md

Obsidian Vault Bot

Telegram bot that saves content (URLs, documents, images) to Obsidian vault via Live Sync (CouchDB). Uses Claude AI to automatically classify content, suggest folders, and generate tags.

Features

URL Processing: Extract content from web pages and save as markdown notes
YouTube Videos: Extract metadata, description, and transcripts from YouTube videos
Image Analysis: AI-powered image description and classification
Document Processing: Extract text from PDFs, DOCX, and other formats
AI Classification: Automatic folder selection, tagging, and summarization
Obsidian Live Sync: Direct integration with CouchDB for real-time sync

Installation

# Clone the repository
git clone https://github.com/yourusername/obsidian-vault-bot.git
cd obsidian-vault-bot

# Install dependencies (requires uv package manager)
uv sync

Configuration

Create a .env file with the following variables:

# Telegram
BOT_TOKEN=your_telegram_bot_token

# Admin users (comma-separated Telegram user IDs)
# Leave empty to allow all users
ADMIN_USERS=123456789,987654321

# Anthropic (for AI classification)
ANTHROPIC_API_KEY=your_anthropic_api_key

# CouchDB (Obsidian Live Sync)
COUCHDB_URL=https://your-couchdb-server.com
COUCHDB_USER=your_username
COUCHDB_PASSWORD=your_password
COUCHDB_DATABASE=obsidian

# Optional: Custom folders and tags
PREDEFINED_FOLDERS=["Inbox", "Articles", "Videos", "Documents", "Images"]
PREDEFINED_TAGS=["reference", "tutorial", "news", "research", "personal"]

Tip: To get your Telegram user ID, send a message to @userinfobot

Usage

# Run the bot
python main.py

Telegram Commands

/add <url> - Save content from URL
/add with image - Save image with AI description
/add with document - Save document with extracted content
Reply to message with /add - Save message content

Examples

/add https://example.com/article
/add https://www.youtube.com/watch?v=VIDEO_ID
/add https://youtu.be/VIDEO_ID

Send an image or document and reply with /add to save it with AI-generated description.

File Structure

When saving content, the bot creates:

{Folder}/
  raw_files/
    {filename}.jpg       # Original image/document
  {Title}.md             # Markdown note with content

The markdown note includes:

YAML frontmatter with tags and source URL
Link to original file (for images/documents)
AI-generated summary
Extracted content

YouTube Videos

YouTube videos are saved with:

Video metadata (channel, duration, views)
Full description
Transcript/subtitles (when available in en/ru/uk/pl)

Architecture

main.py                    # Entry point, dispatcher setup
config.py                  # Environment config
handlers/
  add_handler.py           # /add command handler
services/
  content_processor.py     # URL/document extraction via markitdown
  classifier.py            # Claude AI classification agent
  couchdb_storage.py       # Obsidian Live Sync CouchDB integration

CouchDB Document Format

The bot stores documents in Obsidian Live Sync format:

Metadata document:

{
  "_id": "folder/filename.md",
  "path": "Folder/Filename.md",
  "type": "plain",
  "children": ["h:chunk1", "h:chunk2"],
  "size": 1234,
  "ctime": 1234567890000,
  "mtime": 1234567890000,
  "eden": {}
}

Chunk document:

{
  "_id": "h:randomid",
  "type": "leaf",
  "data": "chunk content"
}

For binary files (images, PDFs):

type: "newnote"
Each chunk is independently base64 encoded

Requirements

Python 3.12+
uv package manager
ffmpeg (for audio/video processing)
CouchDB server with Obsidian Live Sync
Telegram Bot Token
Anthropic API Key

License

MIT