MySQL/Grid Cluster
16 April 2007
Goole Tech Talks - MySQL Grid Cluster [1]
Handeling petabytes worth of data.
NDB are noeds where the data is stored. MySQL is just a client to the NDB nodes.
Originally MySQL Cluster was in memory only.
did you know the Genealogy work is the driving force behind MySQL Clustering with Petabyte storage?
Currently a multi-petabyte storage is 10's of thousands of hard disks
- Cluster has
- cluster nodes (MySQL server?)
- storage controllers
- handles don't want to have everything in one cluster
- be able to export all the data in one cluster
- handles replication
- cluster interconnect medium
- data nodes (NDB nodes)
Do replication similar to disk level RAID to cluster level
Cluster scales up to around 50-100 nodes (about 750TB). Also run into bandwidth problems.
Have to think about OS, application to threads to disk Caching issues, time scheduling issues
Need to be able to invalidate caches
Node failover issues
RAID 10 problem already solved
File system option. FUSE (Filesystem in Userspace) - quick to setup but not the most optimized (efficient)
Generally you want as many client nodes as data nodes, so that you have efficient use of the CPU. With MySQL you will want more MySQL Server client nodes than data nodes, as MySQL has a higher overhead, as it needs to be able to take care of the SQL protocol.
In the Grid of Clusters, the client nodes will now talk to the storage controllers of the cluster nodes.
Scaling of Grid Clusters scales up to around 50 clusters (about 25TByte)
- Major Technical Issues
- Managability of Grid Cluster
- Easy Upgrading
- Data flow, keeping Terabytes/second flowing in Grid Cluster
- Response Time (+> tight integration with Cluster Interconnect)
- Cache Invalidation Protocol
- Clusterd Replication Protocols
MySQL Cluster for Session Management