Saturday, 16 September 2017

Unix design patterns and philosophy

Design patterns got very famous after release of Design-Patterns-Elements-Reusable-Object-Oriented book and it became one of the advance topic of software engineering.

It become one of the must book to have it on desk, it also gave common vocabulary to talk about design. Design pattern idea was extended to capture solution to recurring problem. 

Linda Rising took same idea and wrote book on Fearless Change Patterns Introducing Ideas , she wrote part2 of book More Fearless Change and i am sure lot more books will come in future which will present patterns idea in different shapes & forms.


In this blog i will share design patterns used in UNIX.
All the patterns are 30+ year old but are so much relevant in microservices time.

Lets start with famous quote from  "C. A. R. Hoare"

"There are two ways of constructing a software design. One is to make it so simple that there are obviously no deficiencies; the other is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

The Emperor’s Old Clothes, CACM February 1981
—C. A. R. Hoare

Bottom up design


First design philosophy is unix is build using bottom up approach, it is build from lego blocks, built small pieces and then compose them to create something more useful. Functional programming is based on this principal !

Make each program do one thing well 

    This is such a simple rule but not used enough, developer loves complexity and this rule is violated everyday and results in software that does too many things but many features are not implemented properly.

Some of the example from UNIX of one thing well are grep, diff , patch , Yacc.
These programs are bugfree and so much that functionality is taken for graunted.




Design programs to be connected with other programs(Composition)

This is the rule that invented PIPE (i.e | ) , if our program are build using this rule then lot of time/money spent on integration can be saved.
In Unix world binary input/output is avoided and this makes integration so easy.
Unix program separate interface from logic, interface is "Text" stream and produces "Text" stream.

Each program is independent and they can be composed together because they follow simple rule of "text" interface.
  
As a developer we like to do premature optimization and start with binary input/output, use binary format only when bottleneck is proved.


 

Build software that can be tried early may be in in few days or 1 week. 

This philosophy is also applicable for complex software like Operating System/ Device Driver etc, today we have to take help of Agile to build software that can be tried in 2 or 3 weeks.
Unix guys were doing Agile in 1970.

Avoid Fancy algorithm

Fany algorithm are complex to implement and have bugs and gives dividend only when N is above some threshold, by using simple algo and data structure lot of complex problem can be solved.

Data dominates 
Spend more time on building data structure that will organize data properly.


Write simple parts connected by clean interface 
This is the most difficult on to get right, we have seen different programming paradigm starting form assembler , structural , object oriented , functional etc but all of them have failed, as soon as program turns from POC to real it can't fit in head of human brain.


Clarity over Cleverness 

I am sure every developer will have some war story to tell about debugging Clever program.


Rule of Transparency
This is one the things that is thought  on last day of development in-spite of knowing that this rule is to make inspection and debugging of program easier.


Rule of Repair

When you fail, fail noisily and as soon as possible.

Single Point of Truth(SPOT) rule

This rules says that every piece of knowledge must have single, unambiguous representation in system. Repetition leads to inconsistency so use Constants, tables, metadata, code generator . Complicity is cost , don't pay it twice.  


All the philosophy boils down to


Monday, 24 July 2017

Java Intrinsic magic

I got question on atomicinteger-java-7-vs-java-8 blog about implementation of unsafe.getAndAddInt function



It looks like JDK8 is still using CAS version.
If you just look at the code in IDE then you are correct , but JVM is intelligent hotspot compiler.

Code goes through various optimization before it is executed, optimization list is huge and it will required blog series to just touch on it.

In this blog i will discuss about Intrinsic function

As per WikiPedia 


In compiler theory, an intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.
Compilers that implement intrinsic functions generally enable them only when the user has requested optimization, falling back to a default implementation provided by the language runtime environment otherwise.


So in other words, some function are just like native function, it is different from native function written by us.


Compiler makes decision to use native/special function if is available other wise if will fallback to default implementation.

Unsafe.getAndAddInt is special function that gets advantage of optimization, but normal code is
still required for case when optimization is not requested or not available

To verify this we have to look at Assembly code generated.

how-to-print-dissasembly-from-jit-code article has all the details you need for this.

You can also download dll/lib for windows/linux from github

For below code snippet, optimization kicks in

Assembly code for above code looks like below ( look at line number 7, it is xadd)



Java maintains list of all intrinsic functions in code @ vmSymbols.hpp

Hotspot compiler is full of magic:-)

Tuesday, 27 June 2017

Programmer Oath

Uncle Bob is very famous for his ideas on Clean Code. He is very hardliner on Test driven development.

I am big follower of him and have read few books Clean Code , Clean Coder  written by him, i have become better programmer after reading his books.

I read Clean Coder few years back and felt that this book is must read for every professional software developer.

Uncle bob recently gave a talk The Scribe's Oath , that summaries the whole book in 1 hour.

In this talk he sets nice context before getting to real stuff. He talks about couple of Oaths for programmer.

Not writing harmful code 
He talks about harm software developer does by releasing defect , makes code harder for other programmer to understand. Software should be easy to change and anything that makes hard to change is harmful code.

Code that i will produce will be my best work
This is something that you are proud of.

Each release with quick , sure & repeatable proof that every element of code works as it is supposed to.
This is all the TDD, BDD, CI etc thing comes in. It is inappropriate to accept some level of defect.

Frequent small release and not impede progress
Holding code to yourself and not let anybody seeing it, holding on to big commit are few things that will impede progress.  Small releases are better.

Fearlessly and relentlessly improve the code at every opportunity and never make it worse.
Almost every software developer is slowed down by bad code , but we still continue to write bad code for some reason. We allow bad code to rot in absence of automated Unit test. This is where The Boy Scout Rule is useful . Automated test are really useful to keep this promise.

Keep my and my team productivity high.
Don't hold on the big commit, don't create test that takes hours to run.

Continuously ensure that others cover for me and i cover for them.
This one is interesting, he talks about working like sports team and handling situation when one of the team member is injured or down.
Pair Programming can be good tool to achieve this.

Produce estimate that are honest both in magnitude and precision, no promise without certainty
This is tricky one, most honest estimate is "I DON'T KNOW" but this does not work in real world and we fall in estimate trap and make false promise. He recommend to draw curve around uncertainty.
You manager will come to you and say that we need this in next 2 days , can you try ?
Your answer to this should be i am already trying and not holding back any extra capacity. You should say no, when it is not possible .

Never stop learning and improving my craft.
You have to keep on leaning new language, framework, library otherwise you will fall behind and have to become manager :-)
 
Links to some of Uncle bob talks that i found good

Effective Estimation (or: How not to Lie)
The Principles of Clean Architecture
What is OO really
SOLID Principles of Object Oriented and Agile Design


Thursday, 25 August 2016

Lazy evaluation

Recently i was writing log4j appender and wanted to use logger in it to log some diagnostic details during custom appender creation, but log4j initialization completes only after appender instance are created, so message logged during this phase are ignored.

I felt the need for lazy initialization in custom appender and started to look at options.

In this blog i will share things that i tried.

One of the thing that came to my mind was Singleton approach but now it is known fact that singleton causes problem with testing and make it impossible to extend it, so approach of mixing concurrency & object construction is not that good.

Incase if singleton is required then it is better to use Dependency Injection framework rather than spoiling your application code.

Lets get back to lazy initialization/eval.

Some programming language like scala/swift etc has support for lazy, so no custom code is required to do this but in java space we still have to write thread safe code to get it right.

Lets look at some options we have in java and what type of performance we get.

- Brute force using Synchronized
This is the most simple and inefficient one, scala is using this approach. Scala one is available @ ScalaLazy.java



- Double lock
This is little complex to write and gives good performance.

- Using Future task
This approach is simple to write and gives good performance.


Double lock approach gives the best performance and brute force one is worst. I did quick bench mark for 1 Million calls using different number of thread.


Single lock performance is very bad, lets have look at the number by removing single lock to see how Double Lock & Future Task performed.


These benchmark are done very quickly but detailed benchmark numbers should be close.


Code for this blog post is available @ github






Sunday, 19 June 2016

Java Arrays Sort decoded


Sorting is one the first algorithm that we learn in computer science. Sorting is such an interesting area that it has around 20+ algorithm and it is always difficult to decided which one is best.

Sorting algorithm efficiency is measured in terms of time taken & space required.

Some time bubble sort is best because it has no space requirement and for device where space is constraint or random access of element is not possible then it can be good fit .

These days we tend to use library sort functions, most of language library sort function is adaptive and it uses best algorithm depending on size of data.

In the blog i will share how those decision are made in java Arrays.sort function.

Decision are based on Data Type and Size

 - Byte 
For byte java API decides between Counting Sort or Insertion Sort.

If size of input array is less than 29 then insertion sort is used, visualization of insertion sort
For large arrays Counting sort is used and it is based on fact that range of byte is -128 to 128 and it used as advantage to do quick sort.

Counting sort has little memory requirement and insertion is in place, so over all not much allocation is done and it will keep garbage collector happy when byte array is sorted.

- Char
For char decision is between Counting Sort & Dual Pivot QuickSort

If size of input is greater than 3.2K then counting sort it used and it allocates array of 65K size to implement sorting.

For smaller array Quick Sort variant using Dual pivot is used , visualization of quick sort.


QuickSort used is also in-place , so memory wise not much load on Garbage Collector.


- Integer/Long

For integer/long things gets interesting as Merge Sort makes entry.

For input less than 256 , two options are available
 - If input is less than 47 then Insertion Sort and for other case Dual Pivot QuickSort is used.

For large input array there are some good edge case checks
 - If arrays is already sorted either Ascending or Descending then it is single loop to check that.

 - If element of array are same then Quick Sort is used as it works best in such case.

 - Or if element are really jumbled like for e.g like each even element is greater then odd element then it will use Quick Sort.

and at last all this checks fails then Merge Sort is used and new array of same size is allocated and sorting is performed. Quick refresher of Merge Sort



Important thing to note about Integer sort is that if Array is already sorted then no memory is allocated and if QuickSort kicks in memory allocation is under check.


- Float/Double

Float has special optimization for NAN, all the NAN are moved to end of array and they are skipped from sorting.

Once NAN values are handled then sorting goes through same checks as INTEGER data type.

- Sorting on Objects

Collection sorting has little different rules, for collections it is only between Merge Sort & Timsort.
By default Timsort is used which is mix of merge & insertion sort .

Merge sort is more over decommissioned and it is only used when "java.util.Arrays.useLegacyMergeSort" flag is switched on.


In JDK 8 parallel sort options are also added which are based on input size of array, for arrays with size greater than 8K then parallel version of sort is used.



Friday, 8 April 2016

Binary Search

This blog is part of series that i wanted to start Preparation of Coding Interview

Binary search is one of the first search algorithm that we learn, it is simple to implement and very powerful, gives 0(log n) guarantee.

Quick recap of binary search


  


















It is simple to implement but nearly all binary search were broken due to one bug for long time, you can read about it @ Nearly All Binary Searches are Broken  

Enough of history lets get back to topic and discus some of the interview question on this 

 - Implement Binary Search

This question is still asked so you should practice it because all the logic around index manipulation is little tricky.


Look the line that is finding MID, it is the line that had the bug

 -Find rotation point

This is small variation of binary search, you have to find rotation point in ordered array, for e.g 
















Elements are in increasing order then it drops to min value and then again elements are in increasing order.

How do you define rotation point ?
for any element if both left & right element are greater then current element is at rotation point.

In above case 5th element is at rotation point because both left(4th) and right(6th) are greater than 5th

Another example 
char[] values = {'i', 'j', 'k', 'v', 'a', 'b', 'c', 'd'}

5th element(i.e 'a') is at rotation point

Simple Binary Search will not work on this because 2 sorted array are joined in this.
If you try to solve this by brute force then have to go through each element till you find element that is less than previous.

Brute force solution will look something like below



Since this array is sorted , so it is possible to write hybrid binary search that can take advantage of sorted data.

Now real question is how to decided when to go left or right ?

If we just apply normal binary search logic then mid point is 'v'(i.e 4), now which element should be compared to decided left/right ?

Ah it is the first one ('i') , if mid is greater than first then go to right or if it is less than first then go to left.

Lets put some picture to understand it.




 Step 1 - v > i , so we go to right
 Step 2 - b < i, so we go to left
 Step 3 - mid value (a) is at rotation point and program ends.

Code will look something like below



Hopefully visualization was ok enough to explain how to approach this problem.
My visualization skills are very bad & i have to work on it.


Wednesday, 6 April 2016

Preparation of Coding Interview


I want to share my coding interview experience, so i have decided to write blog series about questions that are asked and this is first one in that series.

Coding interview are very common these days and if you find company that does not have it as part of selection process then i think you should stay away from them.

Lets start with why coding interview is required and reason is very simple especially in times when every company is adopting agile and they don't want to wait for months after hiring to know about coding skills of developer.

So that is the reason why it is the first gate that you have to pass as developer. 

Jeff Atwood share interesting experience about why-cant-programmers-program and type of question that is asked to filter out candidate. FizzBuzz is very good question to start with. 

I would advise to try FizzBuzz to make sure you are not surprised and think about testing also because we are in world of TDD


What types of question are asked

Different types of question are asked in coding interview and it depends on type of company you apply

 - Algorithms & Data structures

This is very common for Silicon valley company or product development company and some serious preparations is required for these types of interview if your daily work involves only playing with frameworks(i.e hibernate/spring etc). 

Things become more difficult for such interview because may developers(like me) are focused so much on learning emerging framework to build bleeding edge Resume that we tend to forget basic algorithm and start tagging ourself as some XYZ framework developer and company also help to the cause by putting such job requirement.

- Simple problems using TDD

This type of questions is also very common and it is make sure that you don't leave testing to QA team and write a code that is defect free , easy to understand and maintainable.

- Refactoring problems

You don't work on green field project every day and most of the time is spent in reading others code and adding features to it, so employer want to know "how do you quickly understand code written by somebody else and add new feature to it".

TDD becomes more important in such questions because it can be used to build all the safety net around existing code before you start changing it.

Above question can be asked as pair programming question or take a home type of question, so you have to build solution based on time given to you. 

Some of the things employer wants to check are

 - Simplicity/Readability of code.
 - Maintainability of code. 
 - Design principal usage.

Bottom line is you have to prepare for coding interview to survive in fast changing market.

Stay tuned up for questions.