Claude Cowork Data Processing
Use Claude Cowork to extract information from documents and data files. Important Anthropic cowork application for data processing.
HTML File Data Extraction
If you've saved web pages as HTML, Claude Cowork can extract data:
I've saved 20 e-commerce product pages as HTML in "product_pages" folder.
Extract from each:
- Product name (usually in <h1> or class="product-title")
- Price (look for "$" or "price" text)
- Description (first product intro paragraph, max 200 chars)
- Stock status (look for "in stock", "out of stock")
- Original filename (for reference)
Generate Excel "product_info.xlsx" with all fields.
If any field not found, write "not found".PDF Data Extraction
Claude Cowork excels at extracting structured info from PDFs:
Invoice Processing
"Invoices" folder contains 100+ PDF invoices.
Extract:
- Invoice number (usually starts with INV-)
- Date (convert to YYYY-MM-DD)
- Vendor name
- Buyer name
- Pre-tax amount
- Tax amount
- Total (verify: pre-tax + tax = total)
Special handling:
1. If total verification fails, note "amount mismatch" in remarks
2. If missing required fields, mark "incomplete" in status
3. Sort by amount descending
Output:
- "invoice_summary_[date].xlsx" - complete table
- "problem_invoices.txt" - list of problematic invoice filenames
After processing, tell me success count and anomaly count.Excel/CSV Processing
Data Cleaning
"RawData" folder has 5 customer data Excel files.
**Step 1: Merge**
- Combine 5 files into one
- Add "source_file" column
**Step 2: Clean**
- Remove duplicate rows
- Standardize phone numbers: remove spaces/dashes, keep 10-11 digits
- Convert emails to lowercase
- Standardize dates to YYYY-MM-DD
- Trim whitespace from all columns
**Step 3: Validate**
- Check phone numbers are 10-11 digits, mark invalid in "data_quality" column
- Check emails contain @, mark invalid
- Check required fields (name, phone) not empty
**Output**:
- "customer_data_cleaned_[date].xlsx"
- "data_quality_report.txt" - duplicate and invalid counts
Tell me total records and issues found before cleaning.Tips
Improve Accuracy
- Provide field location hints: Tell Claude Cowork where data typically appears
- Give format examples: Show expected output format
- Handle exceptions: Specify how to handle missing/invalid data
Performance
- Batch large files: Process in chunks of 50 files
- Set checkpoints: Generate interim reports every 50 files
Security
- ⚠️ Ensure Claude Cowork only accesses necessary folders
- ⚠️ Don't include real sensitive info in prompts
- ⚠️ Review output for accidental data exposure
Note: Claude Cowork processes local files, not live websites. Save web pages as HTML or export to CSV/Excel first.
Related:
