<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>beer planet &#187; logging</title>
	<atom:link href="http://beerpla.net/tag/logging/feed/" rel="self" type="application/rss+xml" />
	<link>http://beerpla.net</link>
	<description>where things have nothing to do with beer - tutorials, tips, how-tos, thoughts, hacks, and other techy nonsense</description>
	<lastBuildDate>Sun, 08 Aug 2010 23:59:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<atom:link rel='hub' href='http://beerpla.net/?pushpress=hub'/>
		<item>
		<title>MySQL Conference Liveblogging: Optimizing MySQL For High Volume Data Logging Applications (Thursday 2:50PM)</title>
		<link>http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/</link>
		<comments>http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/#comments</comments>
		<pubDate>Thu, 17 Apr 2008 21:56:06 +0000</pubDate>
		<dc:creator>Artem Russakovskii</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[high volume]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[optimize]]></category>
		<category><![CDATA[scale]]></category>
		<guid isPermaLink="false">http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/</guid>
		<description><![CDATA[<ul>
<li><a title="http://en.oreilly.com/mysql2008/public/schedule/detail/874" href="http://en.oreilly.com/mysql2008/public/schedule/detail/874">http://en.oreilly.com/mysql2008/public/schedule/detail/874</a></li>
<li>presented by <a href="http://en.oreilly.com/mysql2008/public/schedule/speaker/1287">Charles Lee</a> of <a href="http://hyperic.com/">Hyperic</a></li>
<li>Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application</li>
<li><em>I suddenly remember hyperic was highly recommended above nagios in </em><a href="http://beerpla.net/2008/04/16/mysql-conference-liveblogging-monitoring-tools-wednesday-515pm/"><em>MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)</em></a></li>
<li>performance bottleneck</li>
</ul><ul>
<li>the database</li>
</ul><ul>
<li>CPU</li>
<li>memory</li>
</ul>
<li>IO</li>
<ul>
<li>disk latency</li>
<li>network latency</li>
</ul>
<li>slow queries</li>
<li>media size deployment example</li>
<ul>
<li>300 platforms (300 remote agents collecting data)</li>
<li>2,100 servers</li>
<li>21,000 services (10 services per server), <em>sounds feasible</em></li>
<li>468,000 metrics (20 metrics per service)</li>
<li>28,800,000 metric data rows per day</li>
<li>larger deployments have a</li></ul><p>...<div class=clear></div> <a href="http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/" class="read_more"><div class=excerpt-end>Read the rest of this article &#187;</div></a></p>]]></description>
			<content:encoded><![CDATA[<ul>
<li><a title="http://en.oreilly.com/mysql2008/public/schedule/detail/874" href="http://en.oreilly.com/mysql2008/public/schedule/detail/874">http://en.oreilly.com/mysql2008/public/schedule/detail/874</a></li>
<li>presented by <a href="http://en.oreilly.com/mysql2008/public/schedule/speaker/1287">Charles Lee</a> of <a href="http://hyperic.com/">Hyperic</a></li>
<li>Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application</li>
<li><em>I suddenly remember hyperic was highly recommended above nagios in </em><a href="http://beerpla.net/2008/04/16/mysql-conference-liveblogging-monitoring-tools-wednesday-515pm/"><em>MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)</em></a></li>
<li>performance bottleneck</li>
<ul>
<li>the database</li>
<ul>
<li>CPU</li>
<li>memory</li>
</ul>
<li>IO</li>
<ul>
<li>disk latency</li>
<li>network latency</li>
</ul>
<li>slow queries</li>
</ul>
<li>media size deployment example</li>
<ul>
<li>300 platforms (300 remote agents collecting data)</li>
<li>2,100 servers</li>
<li>21,000 services (10 services per server), <em>sounds feasible</em></li>
<li>468,000 metrics (20 metrics per service)</li>
<li>28,800,000 metric data rows per day</li>
<li>larger deployments have a lot more of these (<em>sounds crazy</em>)</li>
</ul>
<li>data</li>
<ul>
<li>measurement_id</li>
<li>timestamp</li>
<li>value</li>
<li>primary key (timestamp, measurement_id)</li>
</ul>
<li>data flow</li>
<ul>
<li>agent collects data and sends reports to server with multiple data points</li>
<li>server batch inserts metric data points</li>
<li>if network connection fails, agent continues to collect but server &#034;backfills&#034; unavailable</li>
<li>when agent reconnects, spooled data overwrite backfilled data points (<em>why not use REPLACE for all inserts?</em>)</li>
</ul>
<li><em>things are very basic so far</em></li>
<li>batch insert</li>
<ul>
<li>INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),&#8230;</li>
<li>using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%</li>
<li>batch inserts are limited by &#039;max_allowed_packet&#039;</li>
</ul>
<li>other options for increasing insert speed</li>
<ul>
<li>set unique_checks=0, insert, set unique_checks=1 (<em>definitely need to make sure data is valid first</em>)</li>
<li>set foreign_key_checks=0, insert, set foreign_key_checks=1 (<em>same concerns as above</em>)</li>
<li>Hyperic doesn&#039;t use the 2 above</li>
</ul>
<li>INSERT &#8230; ON DUPLICATE KEY UPDATE</li>
<ul>
<li>when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax</li>
<li>it&#039;s much slower but it allows</li>
</ul>
<li><em>this is all basic, where are the performance tweaks?!</em></li>
<li>batch aggregate inserter</li>
<ul>
<li>queue metric data from separate agent reports</li>
<ul>
<li>minimize number of inserts, connections, CPU load</li>
<li>maximize workload efficiency</li>
</ul>
<li>optimal configuration for 700 agents</li>
<ul>
<li>3 workers</li>
<li>2000 batch size seems to work best</li>
<li>queue size of 4,000,000</li>
</ul>
<li>this seems to peak at 2.2mil metric data inserts per minute</li>
</ul>
<li>data consolidation</li>
<ul>
<li>inspired by rrdtool</li>
<li>lower resolution tables track min, avg, and max</li>
<li>data compression runs hourly</li>
<li>size limit 2 days</li>
<li>every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc</li>
<li><em>this is is a good approach if you don&#039;t care about each data point</em></li>
</ul>
<li><em>I&#039;m overwhelmed by the amount of &#034;you know&#034;s from the speaker. Parasite words, ahh! Sorry Charles <img src='http://beerpla.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></li>
<li>software partitioning</li>
<ul>
<li>measurement data split into 18 tables, representing 9 days (2 per day)</li>
<li>they didn&#039;t want to do more than 2 SELECTs to get data per day, hence such sharding</li>
<li><em>oddly, Charles didn&#039;t actually use the word &#039;shard&#039; once</em></li>
<li>tables truncated, rather than deleting rows =&gt; huge performance boost</li>
<li>truncation vs deletion</li>
<ul>
<li>deletion causes contention on rows </li>
<li>truncation doesn&#039;t produce fragmentation</li>
<li>truncation just drops and recreates the table &#8211; single DDL operation</li>
</ul>
</ul>
<li>indexes</li>
<ul>
<li>every <strong>InnoDB</strong> table has a special index called the <strong>clustered index</strong> (based on primary key) where the physical data for the rows is stored</li>
<li>advantages</li>
<ul>
<li>selects faster &#8211; row data is on the same page where the index search leads</li>
<li>inserts in (timestamp) order &#8211; avoid page splits and fragmentation</li>
</ul>
<li>shows comparison between non-clustered index and clustered index (see slides)</li>
</ul>
<li><em>still no mention of configuration tweaks</em></li>
<li>UNION ALL works better than inner SELECTS because the optimizer didn&#039;t optimize them enough (at least in the version these guys are using, not sure which)</li>
<li><em>recommended server options are on the very last slide, I was waiting for those the most! I guess I&#039;ll look up the slides after</em></li>
</ul>
<div class="shr-bookmarks shr-bookmarks-expand">
<ul class="socials">
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+App%5B..%5D+-+http://bit.ly/cmLynB&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;t=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-hackernews">
			<a href="http://news.ycombinator.com/submitlink?u=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;t=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Submit this to Hacker News">Submit this to Hacker News</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;title=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;title=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;title=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/&amp;title=MySQL+Conference+Liveblogging%3A+Optimizing+MySQL+For+High+Volume+Data+Logging+Applications+%28Thursday+2%3A50PM%29" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-mail">
			<a href="mailto:?subject=%22MySQL%20Conference%20Liveblogging%3A%20Optimizing%20MySQL%20For%20High%20Volume%20Data%20Logging%20Applications%20%28Thursday%202%3A50PM%29%22&amp;body=Link: http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/ (sent via shareaholic)%0D%0A%0D%0A----%0D%0A %20http%3A%2F%2Fen.oreilly.com%2Fmysql2008%2Fpublic%2Fschedule%2Fdetail%2F874%20presented%20by%20Charles%20Lee%20of%20Hyperic%20Hyperic%20has%20the%20best%20performance%20with%20MySQL%20out%20of%20MySQL%2C%20Oracle%2C%20and%20Postgres%20in%20their%20application%20I%20suddenly%20remember%20hyperic%20was%20highly%20recommended%20above%20nagios%20in%20MySQL%20Conference%20Liveblogging%3A%20Monito" rel="nofollow" class="external" title="Email this to a friend?">Email this to a friend?</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>
Similar Posts:<ul><li><a href="http://beerpla.net/2009/05/11/mysql-deletingupdating-rows-common-to-2-tables-speed-and-slave-lag-considerations/" rel="bookmark" title="May 11, 2009">[MySQL] Deleting/Updating Rows Common To 2 Tables &#8211; Speed And Slave Lag Considerations</a></li>
<li><a href="http://beerpla.net/2008/04/15/mysql-conference-liveblogging-explain-demystified-tuesday-200p/" rel="bookmark" title="April 15, 2008">MySQL Conference Liveblogging: EXPLAIN Demystified (Tuesday 2:00PM)</a></li>
<li><a href="http://beerpla.net/2008/04/15/mysql-conference-liveblogging-performance-guide-for-mysql-cluster-tuesday-1050am/" rel="bookmark" title="April 15, 2008">MySQL Conference Liveblogging: Performance Guide For MySQL Cluster (Tuesday 10:50AM)</a></li>
<li><a href="http://beerpla.net/2009/02/17/swapping-column-values-in-mysql/" rel="bookmark" title="February 17, 2009">Swapping Column Values in MySQL</a></li>
<li><a href="http://beerpla.net/2009/03/18/mysql-indexing-considerations-of-implementing-a-priority-field-in-your-application/" rel="bookmark" title="March 18, 2009">MySQL Indexing Considerations Of Implementing A Priority Field In Your Application</a></li>
</ul><!-- Similar Posts took 10.939 ms --><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://beerpla.net/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://beerpla.net/2008/04/17/mysql-conference-liveblogging-optimizing-mysql-for-high-volume-data-logging-applications-thursday-250pm/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
