SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
Senior Staff Software Engineer, Data infrastructure at LinkedIn
Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
• 100B documents • 1B
documents ingested per day • 100M queries per day • 10’s of ms latency • 30 tables in prod, 250 * 3 std app nodes Pinot @ LinkedIn Tuesday, August 18, 15
(S)QL: Filters and Aggs SELECT
count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ AND action = 'stop' Tuesday, August 18, 15
(S)QL: Group By SELECT count(*)
FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ GROUP BY action Tuesday, August 18, 15
(S)QL: ORDER BY and LIMIT
SELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1 Tuesday, August 18, 15
Whats not supported • JOIN:
unpredictable performance • NOT A SOURCE OF TRUTH • Mutation Tuesday, August 18, 15
Pinot • Data flow •
Query Execution • How to use/operate • Pinot @ LinkedIn - Future Tuesday, August 18, 15
Pinot Query Execution: Single Node
Architecture EXECUTION ENGINE INVERTED INDEX BITMAP INDEX COLUMN FORMAT PLANNER Tuesday, August 18, 15
Pinot Query Execution: Single Node
Architecture SELECT campaignId, sum(clicks) FROM Table A WHERE accountId = 121011 AND 'day' >= 15949 GROUP BY campaignId account Id daycampaign Id click Filter Operator Projection Operator Aggregation Group by Operator Combine Operator Pinot Segments Data sources Matching doc ids campaignId,Click tuple Tuesday, August 18, 15
Cluster Management: Deployment Helix Brokers
Servers • Brokers and Servers register themselves in Helix • All servers start with no use case specific configuration Controller Tuesday, August 18, 15
On boarding new use case
Helix Brokers Servers XLNT XLNT XLNT Create Table command Controller XLNT XLNTTag Servers TableName Brokers 3 XLNT_T1 1 Tuesday, August 18, 15
• AUTO recovery mode: Automatically
redistribute segments on failure/addition of new nodes • Custom mode: Run in degraded mode until node is restarted/replaced. Pinot - Fault tolerance/Elasticity Tuesday, August 18, 15
Pinot vs Druid Druid Pinot
Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and schema evolution/Bootstrap is hard Inverted Index Always On all columns, Fixed Configurable on per column basis Allows trade off between scanning v/s inverted index + scanning. More data can be fit in given memory size Data organization N/A Sorts data Organizing data provides speed/better compression and removes the need for inverted index Smart pre- materialization N/A star-tree Allows trade off between latency and space Query Execution Layer Fixed Plan Split into Planning and execution Smart choices can be made at runtime based on metadata/query. Tuesday, August 18, 15
• Documentation & tooling •
In progress - consistency among real time replicas. • Improve cost to serve - leverage SSD, partial pre materialization • ThirdEye - Business Metrics Monitoring Pinot - Future Tuesday, August 18, 15