scaling « Artem Russakovskii's programming and technology blog

1

Hadoop Primer – Yet Another Hadoop Introduction

Posted by Artem Russakovskii on October 20th, 2008 in Databases, Programming

I just came upon a pretty good Hadoop introduction paper posted on Sun’s wiki. Apache Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) (wikipedia). I wouldn’t call it an alternative to mysql – they’re in completely different weight categories. I like to think of Hadoop as a complement – I think it’s closer to memcached in its functions than to mysql. Perhaps a hybrid of both but a unique beast nonetheless. If you’re serious about scaling, you owe it to yourself to start exploring Hadoop yesterday.

A couple of …

Read the rest of this article »

0

MySQL Conference Liveblogging: Applied Partitioning And Scaling your (OLTP) Database System (Wednesday 11:55AM)

Posted by Artem Russakovskii on April 16th, 2008 in Databases

Phil Hilderbrand of thePlatform for Media, Inc presents
classic partitioning

old school – union in the archive tables
auto partitioning and partition pruning
great for data warehousing
query performance improved
maintenance is clearly improved

design issues in applying partitioning to OLTP (On-Line Transaction Processing)

often id driven access vs date driven access
1 big clients could be 80% of the whole database, so there's a difficulty selecting partitioning schemes

partitioning is only supported starting from MySQL 5.1

understanding the benefits

reducing seek and scan set sizes
improving inserts/updates durations
making maintenance easier

shows an EXPLAIN output for SELECTS on non-partitioned and partitioned tables. The results are significantly better for partitions

OPTIMIZE TABLE on an unpartitioned table takes 1.14s

ALTER TABLE …

Read the rest of this article »

1

MySQL Conference Liveblogging: Portable Scale-out Benchmarks For MySQL (Wednesday 10:50AM)

Posted by Artem Russakovskii on April 16th, 2008 in Databases

Robert Hodges from Continuent presents
About Continuent

leading provider of open source database availability and scaling solutions

solutions

uni/cluster – multi-master database clustering that replicates data across multiple databases and load balances reads
uses "database virtualization"

scale-out design motivation

protection from db and site failures
continuous operation during upgrades

how come not everyone has it already?

creating identical replicas across different hosts is hard

Brewer's conjecture

trade-offs

DDL support
inconsistent reads between replicas
deadlocks
sequences
non-deterministic SQL

therefore many scale-out approaches are non-transparent

3 basic scale-out technologies

data replication

where are updates processed? master/master vs master/slave
when are updates replicated? sync vs async

group communication – coordinates messages between distributed processes

views – who is active, who is crashed, do

…

Read the rest of this article »

Artem Russakovskii's programming and technology blog

Hadoop Primer – Yet Another Hadoop Introduction

MySQL Conference Liveblogging: Applied Partitioning And Scaling your (OLTP) Database System (Wednesday 11:55AM)

MySQL Conference Liveblogging: Portable Scale-out Benchmarks For MySQL (Wednesday 10:50AM)

About Me

Pages

Categories

My Sites

Recent Comments

Artem Russakovskii's programming and technology blog

Hadoop Primer – Yet Another Hadoop Introduction

MySQL Conference Liveblogging: Applied Partitioning And Scaling your (OLTP) Database System (Wednesday 11:55AM)

MySQL Conference Liveblogging: Portable Scale-out Benchmarks For MySQL (Wednesday 10:50AM)

About Me

Pages

Categories

Tag Cloud

My Sites

Recent Comments