Presto
What it is: Distributed SQL query engine developed at Meta (Facebook). Query data across multiple sources without moving it. ANSI SQL on everything.
What It Does Best
Data federation. JOIN data across S3, MySQL, PostgreSQL, Cassandra in one query. No ETL required.
Interactive speed. In-memory distributed execution. Query petabytes with sub-second latency.
Standard SQL. Full ANSI SQL support. Analysts use familiar syntax across all data sources.
Key Features
Federation: 50+ connectors to different data sources
Cost-based optimizer: Intelligent query planning
In-memory execution: Fast distributed query processing
ANSI SQL: Standard SQL across all sources
Extensible: Custom connectors and functions
Pricing
Open Source: Free, Apache 2.0 license
AWS EMR: EC2 compute costs + EMR charge
Self-managed: Cloud VM costs only
Note: Consider Trino (active fork) for new projects
When to Use It
β Data lake analytics (S3, HDFS)
β Querying across multiple data sources
β Ad-hoc exploratory analytics
β Too much data to move into warehouse
β Existing Presto deployments
When NOT to Use It
β Operational workloads (analytics only)
β Small datasets (overhead not worth it)
β Need data persistence (compute-only layer)
β New projects (use Trino instead)
β Need latest features (Trino more active)
Common Use Cases
Data lake queries: SQL on files in S3/HDFS
Cross-database joins: Combine data from multiple sources
Ad-hoc analysis: Explore data without moving it
Data virtualization: Unified view across systems
ETL alternative: Query in place instead of copying
Presto vs Alternatives
vs Trino: Trino more active development, use Trino for new projects
vs Athena: Presto more control, Athena fully managed
vs Spark SQL: Presto faster for interactive, Spark better for batch
Unique Strengths
Query federation: JOIN across any data source
No data movement: Query data where it lives
Interactive speed: Fast enough for real-time exploration
ANSI SQL: Standard SQL everywhere
Bottom line: Query engine for distributed analytics. Don't move dataβquery it where it lives. Great for data lakes. Note: Presto forked into PrestoDB and Trino. Choose based on community preference.