Attached Code: https://github.com/iainkfraser/cache_transpose
Before we begin, the underlined words are briefly described in the glossary at the end. If there are any terms you don't understand let me know and I'll place them in the glossary as well.
As the wise contemporary philosophers the Wu-Tang Clan once said:
"Cache Rules Everything Around Me ( C.R.E.A.M ) "Many people believe this is just typical rappers exhibiting the materialistic nature of modern life. However, they were in fact visionaries describing the importance of memory hierarchy utilisation on processor performance.
Traditionally the time complexity of an algorithm was based on the RAM model. The model assumes that access to all memory locations takes equivalent constant time. In reality a cache hit is an order of magnitude faster than a cache miss.
Therefore it should be possible to significantly improve an algorithm by reordering its memory access to be cache friendly. This isn't a particularly novel idea and it has a name: cache aware algorithms.
The problem with cache aware algorithms is that they depend on the cache size. The algorithm must be tuned for every different cache it runs on. This unportable tuning was the inspiration for Harald Prokop's cache oblivious algorithms. They are algorithms that are cache aware but don't require tuning.
To illustrate cache-oblivious and cache-aware algorithms we first need an algorithm we can improve. The key is that we don't change the time complexity at all. That means we still do the same number of operations - so the processors do the same work. The only variant is the order of memory access and therefore cache utilisation.
Algorithms that access memory often are ripe for the picking. I decided to implement the matrix transpose because it one of the simplest memory heavy algorithms. The matrix transpose simply takes the rows of input algorithm and makes them the columns of the output matrix, like so:
You don't need to understand matrices at all to understand this algorithm. You only need to understand one definition:
![[\mathbf{A}^\mathrm{T}]_{ij} = [\mathbf{A}]_{ji}](http://upload.wikimedia.org/math/6/7/6/676a09fb68a5cfb70409594b8622e226.png)
This simply states that the entry with jth row and ith column for the input matrix is equal to the entry with ith row and jth column in the output matrix.
This cache-unaware or naive algorithm directly uses this definition. As shown below:
for( int i = 0; i < N; i++ ){
for( int j = 0; j < N; j++ ){
out[j][i] = in[i][j];
}
}
Just a side note, the matrices here are large e.g. in this blogs associated code N=1024 i.e. the matrix is 4MB ( each entry is 4 bytes so 4x1024x1024) . Anyway back to the analysis 2D arrays in C are stored in row-major order.
This means that an entry's horizontal neighbours ( i.e. row neighbours ) will be on the same cache line. Unless the entry itself is at the end of a cache line. Therefore accessing a horizontal neighbour will likely be a cache hit. Conversely accessing a vertical neighbour will likely be a cache miss ( unless its a relatively small matrix and its row is small enough to fit in a cache line ).
The naive algorithm reading the input matrix will often cache hit because it traverses in row order ( the inner iterator j walks the column ). However writing to the output matrix will likely cache miss due to column order traversal.
How can this be improved? Well we can change the order of writing to minimize the cache misses. That is exactly what the cache-aware algorithm does. It splits the matrix into a set of sub-matrices that have a size equivalent to the cache line. Then runs that algorithm on the sub-matrices:
Don't let sub-matrices confuse you. We are still essentially following the
![[\mathbf{A}^\mathrm{T}]_{ij} = [\mathbf{A}]_{ji}](http://upload.wikimedia.org/math/6/7/6/676a09fb68a5cfb70409594b8622e226.png)
The problem with the previous cache aware algorithm is it took the cache line size as a parameter. So we would have to tune it for each different target. Can we eliminate this tuning? Sure we can! The cache oblivious (CO) way.
The CO basically keeps splitting the matrix into sub-blocks until we reach blocks of size 1. In reality you set it to slightly larger then one to minimize the cost of recursion. Then apply the transpose on the blocks. So the CO is basically the cache-aware algorithm but the tuning is done automatically through divide and conquer.
But wait, isn't setting the block size too small going to cost a cache miss? No thats the clever bit! If the block is smaller than the cache line then the siblings and eventually parent block ( the block that encompasses the subblock ) will still be in cache. As shown below
So lets see if the wu-tang hypothesis is correct. The average results of running the transpose on a 1024x1024 matrix 100 times are:
Naive | Cache Aware | Cache Oblivious |
---|---|---|
5704 ms | 763.33 ms | 1212 ms |
Of course the Wu-Tang clan was right on the money. So the algorithms are doing the exact same amount of operations and the difference is huge. The cache oblivious algorithm is slightly slower because its paying for the recursion. As the size of the matrix grows the cost of recursion will reduce. So we can say that the cache oblivious algorithm is asymptotically equivalent to the cache aware one.
This blog has given a very broad overview. For more information please read Harald Prokop's MIT master's thesis and the attached source code.
For the people that do read the thesis one thing that had me confused was the cache complexity was often stated as O( 1 + k / N ) and I couldn't figure out where the 1 came from. Its because big-O represents worst case, and the worst case in terms of cache is when the memory address is misaligned so you have to pay for an additional cache miss per object access.
So the take home for today is, when your code runs slow don't forget the cache! Until next time keep it real...
Glossary
Time Complexity - A way of comparing algorithms mathematically. Instead of comparing them by actual time measurements that are affected by the computer the algorithm is run on. Compare them by the expected number of operations. The number of operations is typically dependent on the size of the input n. So we say its complexity is a function of n.
Cache Hit - When the CPU attempts to access an object that's already in cache.
Cache Miss - When the CPU attempts to access an object that's not in cache so it must be copied from memory to cache ( a line size at a time ).
Row Major - A 2D ( or n dimensional ) array in C is actually stored as one large single array. In row major order rows are written out after each other. So for example if the array is defined as x[3][5] and you access element x[2][4] its actual set at 2*5 + 4 because it has to skip the first two rows and then move to the 4th element.
Cool article with good illustration, please keep writing!
ReplyDeletethank-you ... I will try to blog about every Sunday project I take on. Next Sun is going to be on linear programming or indexing. I may write an article on JIT from the Sunday before last :)
ReplyDeleteI believe that cache-oblivious algorithms are asymptotically optimal as the number of cache levels in the hierarchy increases, not asymptotically optimal as the input size increases.
ReplyDeleteIn "The Cost of Cache-Oblivious Searching" (Proceedings. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003), Bender et al. prove that cache-oblivious search structures take (lg e) or approximately 44% more block transfers than the optimal cache-aware algorithm in the 2-level memory model (the DAM model) and can do no better. As the number of levels in the memory model increases, the performance loss of the cache-oblivious algorithm approaches zero.
Of course real hardware does not have a 2-level memory hierarchy nor does it have an infinite level memory hierarchy. But I have found that in practice, the 2-level memory model is a good abstraction for application performance. There is usually one "jump" in the memory hierarchy that has a significant performance impact. Either the gap from disk to main memory, or the gap from main memory to highest cache level, depending on the application.
Thank you for letting me know. Ill read that on Sat and update the correction.
DeleteAwesome article! Interesting note when I ran the code, the cache oblivious was faster than the cache-aware. (compiled with gcc -O3 and ran on an i7)
ReplyDeletenaive : 1696 1707 1707
cache aware : 798 800 803
cache oblivous: 364 366 364
It's oblivously better!
Deleteha!
DeleteThanks Danny ... A friend of mine also noticed similar results. Unfortunately I must restrict myself to Sundays for tangent work so I really can't do more research on the cause.
DeleteIf are interested in finding out. Before studying the optimised output make sure:
1) You have set the cache line parameter for the aware algorithm appropriately
2) Verify that the matrix begins on a page boundary using mmap ( so that its cached align )
If you do find out, please let me know and Ill update this blog.
Thanks
Worth noting is that the ratio of performance increase in the three cases are as follows:
Deletenaive: 3.35 times faster
cache aware: 0.954 times as fast
cache oblivious: 3.32 times faster
So that supports the hypothesis that your test machine is around 3.3 times faster than the one used for the article, but you have incorrectly configured the cache size or alignment in the cache-aware case, in one of the ways Iain describes above.
Incidentally, this is a good anecdotal argument as to why the cache-oblivious algorithm is a better choice in general, because it's harder to misconfigure.
get the memory, mega mega bytes ya'll!
ReplyDeletebravo :)
DeleteWell-written. Made my day.
ReplyDelete(memory) Word!
Hey there, I’m Nikitha Bangalore. I’m a Model living in Bangalore. I am a fan of Independent Bangalore Escorts, Bangalore escorts, and call girls in Bangalore.
ReplyDeleteYou are so awesome! I don’t think I’ve read through anything
ReplyDeletelike that before.
토토
바카라사이트
When someone writes an article he/she keeps the thought of a user in his/her brain that how a user can understand it. Thus that’s why this article is outstanding. Thanks!
ReplyDelete경마사이트
토토사이트
This is an informative post review. I am so pleased to get this post article and nice information. Thanks for sharing with us.
ReplyDelete토토
바카라사이트
파워볼
카지노사이트
Your blogs further more each else volume is so entertaining further serviceable It appoints me befall retreat encore. I will instantly grab your rss feed to stay informed of any updates.
ReplyDelete스포츠토토
안전놀이터
토토사이트
I recently found many useful information in your website especially this blog page. Among the lots of comments on your articles. Thanks for sharing.
ReplyDelete스포츠토토
카지노사이트
파워볼게임
바카라
Buy shot blasting machinefrom surface treatment industry.
ReplyDeleteEverything is very open with a precise explanation of the issues. It was really informative. Your website is useful. Thank you for sharing! Feel free to visit my website; 야설
ReplyDeleteI think this is one of the best blog for me because this is really helpful for me. Thanks for sharing this valuable information for free Feel free to visit my website; 한국야동
ReplyDeleteIf more people that write articles really concerned themselves with writing great content like you, more readers would be interested in their writings. Thank you for caring about your content. Feel free to visit my website; 국산야동
ReplyDeleteYour content is nothing short of brilliant in many ways. I think this is engaging and eye-opening material. Thank you so much for caring about your content and your readers. Feel free to visit my website; 일본야동
ReplyDeleteQueen Casino & Slots Review (2021) | Play Online Casino
ReplyDeleteQueen Casino is rated 8.6/10. 퍼스트 카지노 It provides a fun and modern atmosphere with plenty of games for fun88 vin players to enjoy, the range of bonuses and promotions クイーンカジノ are
This is one of the best website I have seen in a long time thank you so much.
ReplyDelete스포츠토토
Buying a business does not have to be a complicated endeavor when the proper process and methodology is followed. In this article, we outline eleven specific steps that should be adhered to when buying a business and bank financing is planned to be utilized. 메이저토토사이트추천
ReplyDeleteIf some one desires expert view concerning
ReplyDeleteblogging and site-building afterward i suggest him/her to pay a quick visit this website, Keep up the nice work. 스포츠토토
This post gives clear idea in favor of the new users of blogging,
ReplyDeletethat in fact how to do running a blog. 파워볼사이트
It’s an awesome paragraph in support of all the web
ReplyDeleteviewers; they will get benefit from it I am sure. 바카라사이트
Very wonderful informative article. I appreciated looking at your article. Very wonderful reveal. I would like to twit this on my followers. Many thanks!
ReplyDeleteAcostarse Conjugation in Spanish
Dimensional Analysis Calculator
Electric Force
Gender Schema Theory
Combination Reaction
Selective Perception
Derivative of sec x, tan x
Reward Power in Leadership
Remainder Theorem
Pure Substance in Chemistry
Keep up the good work , I read few blog posts on this website and I believe that your weblog is rattling interesting. Thank you for this effort, I will give you 5 stars for this. Kindly check the link below Thank you! Feel free to visit my website; 배트맨토토
ReplyDeleteIt is not my first time to go to see this website, i am browsing this web site daily and take good facts from here every day. 바카라사이트
ReplyDeleteThis is the perfect post.안전놀이터 It helped me a lot. If you have time, I hope you come to my site and share your opinions. Have a nice day.
ReplyDeletePlease post useful information often. We will continue to visit in the future. Have a good day. 먹튀검증업체
ReplyDeleteI like your blog. i ma happy to read your blog its very informative and your blog is really good and impressive you made it 메이저검증업체
ReplyDeleteThank you for the information provided! 토토 Maintain the good performance of your site. You can also check my article
ReplyDeleteI read this post completely about the comparison of hottest and previous technologies, it’s remarkable article. 안전사이트
ReplyDeleteNice Blog. Thanks for sharing with us. Such amazing information.
ReplyDeleteWhat is brand awareness, and how to enhance it?
I am a 슬롯사이트 expert. I've read a lot of articles, but I'm the first person to understand as well as you. I leave a post for the first time. It's great!!
ReplyDelete토토
ReplyDelete안전놀이터
프로토
This is really interesting, You are a very skilled blogger.
I've joined your feed and look forward to seeking more of your great post.
Also, I've shared your web site in my social networks!
스포츠중계
ReplyDelete스포츠토토티비
토토사이트
Asking questions are genuinely good thing if you
are not understanding something fully, however this post offers
fastidious understanding even.
토토
ReplyDelete스포츠토토
먹튀검증
Excellent post. I was checking constantly this blog and I'm impressed!
Very helpful info particularly the last part :) I care for such info much.
I was seeking this particular info for a long time. Thank
you and good luck.
Hey friend, it is very well written article, thank you for the valuable and useful information you provide in this post. Keep up the good work! FYI, shih tzu hair products , Airtel Axis Bank Credit Card Review, She erased her pdf download by Himanshu Rai,Paragraph On An Ideal Student
ReplyDeleteHow to Play Casino: Easy Guide to playing slots on
ReplyDeleteCasino games are played by 4 players, ventureberg.com/ the average 1xbet app time they casinosites.one take 토토 turns is around 14:20. The house is divided into three distinct wooricasinos.info categories: the house
여기가바로 먹튀검증 배팅의성지
ReplyDelete슬롯커뮤니티
ReplyDeleteI'm so happy to finally find a post with what I want. 안전놀이터순위 You have inspired me a lot. If you are satisfied, please visit my website and leave your feedback.
ReplyDeleteRoyalcasino657
ReplyDeleteI clearly stumbled upon your weblog and favored to mention that I’ve truly loved reading your blog posts.
ReplyDelete온라인카지노
I really appreciate this wonderful post that you have provided for us.
ReplyDelete바카라사이트
I felt very happy while reading this site. This was really very informative site for me. 바둑이사이트넷
ReplyDeleteI am the one who writes on a topic similar to yours. I hope you come to my blog and take a look at the posts I've been writing. 안전놀이터추천
ReplyDeleteYour post is very helpful and information is reliable. I am satisfied with your post. Thank you so much for sharing this wonderful post. If you have any assignment requirement then you are at the right place. 메이저사이트
ReplyDeletecbb
I know this is one of the most meaningful information for me. And I'm animated reading your article 카지노사이트
ReplyDeleteBut should remark on some general things, the website style is perfect; the articles are great.
ReplyDelete카지노사이트존
카지노사이트
바카라사이트
This was an incredible post. Really loved studying this site post.
ReplyDelete바카라사이트
카지노사이트
온라인카지노
바카라사이트닷컴
This type of clever blog work and coverage! Keep up the great works
ReplyDelete온라인카지노
바카라사이트
카지노사이트
온라인카지노
บริการเกมสล็อตออนไลน์ปี 2022 เกมให้เลือกเล่นมากกว่า 1,000 เกม สล็อต d เล่นได้จริง แจกจริง สมัครเลยตอนนี้
ReplyDeleteIf you are interested in sports, my blog will be very helpful to you.
ReplyDelete토토사이트
Wow! it was too good and very innovative post...
ReplyDeleteDivorce Attorneys Fairfax va
Divorce Attorney in Fairfax
Great share! Keep posting!
ReplyDeleteجدة
VERY INTERESTING, I WISH TO SEE MUCH MORE LIKE THIS. THANK YOU FOR SHARING THIS****** KIND OF INFORMATION!
ReplyDeleteYOUR WRITING SKILL IS SO -------GOOD, KEEP IT UP!
ReplyDeleteI HAVE READ YOUR POST. THIS IS A GREAT JOB, I WANT TO SAY THANK YOU FOR THIS POST. 😘😘😘***-***
ReplyDeleteI learned a lot from your post. Thank you for sharing your knowledge." This shows that you found the post informative and helpful.
ReplyDeleteFormularios de divorcio de Virginia Beach sin oposición
preliminary protective order hearing virginia
It was really a great post. Hope to read many such posts!
ReplyDeletebest divorce lawyers in arlington va