Enable Python developers to leverage Words to Data's powerful parsing and diffing capabilities through native Python bindings using PyO3.
Key Features
- Native Python API for parsing USC documents and bills
- Pythonic interfaces for diff computation
- Full type hints and documentation
- PyPI distribution for easy installation
Use Cases
- Data science and legal analytics workflows
- Jupyter notebook integration for research
- Web scraping and automated legal document processing
Extract richer metadata and structured information from Public Law documents including sponsorship, voting records, and legislative intent.
Planned Enhancements
- Extract bill sponsors and co-sponsors
- Parse committee assignments and referrals
- Capture legislative history and amendments
- Identify effective dates and sunset provisions
- Extract cross-references to regulations and case law
- Improved detection of amending actions and intent
Benefits
- Comprehensive legislative tracking
- Better understanding of bill lifecycle
- Enhanced research capabilities for legal scholars
A specialized diff algorithm designed to understand and highlight legal document changes with semantic awareness of legal structures and terminology.
Key Capabilities
- Detection of substantive vs. formatting changes
- Identification of definitional changes and their cascading effects
- Recognition of cross-reference updates and their implications
- Tracking of effective date modifications
- Detection of renumbering and reorganization patterns
Advanced Features
- Legal citation normalization and matching
- Smart handling of insertions, strikes, and replacements
- Classification of amendment types (expansion, restriction, clarification)
- Change impact scoring and severity assessment
- Support for conditional and contingent modifications
- Multi-version change tracking and lineage
Benefits
- More accurate change detection for legal documents
- Better understanding of legislative intent
- Reduced false positives from formatting changes
- Enhanced compliance and regulatory tracking
- Improved legal research and analysis workflows
Curated, labeled datasets of parsed legal documents ready for machine learning, research, and analysis.
Dataset Categories
- Complete USC title snapshots with temporal versions
- Annotated bill amendments with classified action types
- Legislative change tracking corpus (multi-year)
- Topic-labeled legal document collections
- Diff datasets showing legislative evolution
Applications
- Training legal language models
- Policy research and analysis
- Legal analytics and trend detection
- Academic research benchmarking
Comprehensive system for tracking members of Congress, their voting records, and behavioral patterns on legislation.
Core Capabilities
- Parse and store voting records from congress.gov
- Link votes to specific bills and amendments
- Track member sponsorship and co-sponsorship patterns
- Committee membership and activity tracking
- Historical voting record analysis
- Party affiliation and district information
Analysis Features
- Voting pattern clustering and similarity analysis
- Ideology scoring based on voting behavior
- Bill co-sponsorship network analysis
- Legislative effectiveness metrics
- Topic-based voting alignment tracking
Use Cases
- Political science research
- Constituent tracking and accountability
- Advocacy group targeting and engagement
- Predictive modeling of legislative outcomes