Thursday, 25 August 2016

Lazy evaluation

Recently i was writing log4j appender and wanted to use logger in it to log some diagnostic details during custom appender creation, but log4j initialization completes only after appender instance are created, so message logged during this phase are ignored.

I felt the need for lazy initialization in custom appender and started to look at options.

In this blog i will share things that i tried.

One of the thing that came to my mind was Singleton approach but now it is known fact that singleton causes problem with testing and make it impossible to extend it, so approach of mixing concurrency & object construction is not that good.

Incase if singleton is required then it is better to use Dependency Injection framework rather than spoiling your application code.

Lets get back to lazy initialization/eval.

Some programming language like scala/swift etc has support for lazy, so no custom code is required to do this but in java space we still have to write thread safe code to get it right.

Lets look at some options we have in java and what type of performance we get.

- Brute force using Synchronized
This is the most simple and inefficient one, scala is using this approach. Scala one is available @ ScalaLazy.java



- Double lock
This is little complex to write and gives good performance.

- Using Future task
This approach is simple to write and gives good performance.


Double lock approach gives the best performance and brute force one is worst. I did quick bench mark for 1 Million calls using different number of thread.


Single lock performance is very bad, lets have look at the number by removing single lock to see how Double Lock & Future Task performed.


These benchmark are done very quickly but detailed benchmark numbers should be close.


Code for this blog post is available @ github






Sunday, 19 June 2016

Java Arrays Sort decoded


Sorting is one the first algorithm that we learn in computer science. Sorting is such an interesting area that it has around 20+ algorithm and it is always difficult to decided which one is best.

Sorting algorithm efficiency is measured in terms of time taken & space required.

Some time bubble sort is best because it has no space requirement and for device where space is constraint or random access of element is not possible then it can be good fit .

These days we tend to use library sort functions, most of language library sort function is adaptive and it uses best algorithm depending on size of data.

In the blog i will share how those decision are made in java Arrays.sort function.

Decision are based on Data Type and Size

 - Byte 
For byte java API decides between Counting Sort or Insertion Sort.

If size of input array is less than 29 then insertion sort is used, visualization of insertion sort
For large arrays Counting sort is used and it is based on fact that range of byte is -128 to 128 and it used as advantage to do quick sort.

Counting sort has little memory requirement and insertion is in place, so over all not much allocation is done and it will keep garbage collector happy when byte array is sorted.

- Char
For char decision is between Counting Sort & Dual Pivot QuickSort

If size of input is greater than 3.2K then counting sort it used and it allocates array of 65K size to implement sorting.

For smaller array Quick Sort variant using Dual pivot is used , visualization of quick sort.


QuickSort used is also in-place , so memory wise not much load on Garbage Collector.


- Integer/Long

For integer/long things gets interesting as Merge Sort makes entry.

For input less than 256 , two options are available
 - If input is less than 47 then Insertion Sort and for other case Dual Pivot QuickSort is used.

For large input array there are some good edge case checks
 - If arrays is already sorted either Ascending or Descending then it is single loop to check that.

 - If element of array are same then Quick Sort is used as it works best in such case.

 - Or if element are really jumbled like for e.g like each even element is greater then odd element then it will use Quick Sort.

and at last all this checks fails then Merge Sort is used and new array of same size is allocated and sorting is performed. Quick refresher of Merge Sort



Important thing to note about Integer sort is that if Array is already sorted then no memory is allocated and if QuickSort kicks in memory allocation is under check.


- Float/Double

Float has special optimization for NAN, all the NAN are moved to end of array and they are skipped from sorting.

Once NAN values are handled then sorting goes through same checks as INTEGER data type.

- Sorting on Objects

Collection sorting has little different rules, for collections it is only between Merge Sort & Timsort.
By default Timsort is used which is mix of merge & insertion sort .

Merge sort is more over decommissioned and it is only used when "java.util.Arrays.useLegacyMergeSort" flag is switched on.


In JDK 8 parallel sort options are also added which are based on input size of array, for arrays with size greater than 8K then parallel version of sort is used.



Friday, 8 April 2016

Binary Search

This blog is part of series that i wanted to start Preparation of Coding Interview

Binary search is one of the first search algorithm that we learn, it is simple to implement and very powerful, gives 0(log n) guarantee.

Quick recap of binary search


  


















It is simple to implement but nearly all binary search were broken due to one bug for long time, you can read about it @ Nearly All Binary Searches are Broken  

Enough of history lets get back to topic and discus some of the interview question on this 

 - Implement Binary Search

This question is still asked so you should practice it because all the logic around index manipulation is little tricky.


Look the line that is finding MID, it is the line that had the bug

 -Find rotation point

This is small variation of binary search, you have to find rotation point in ordered array, for e.g 
















Elements are in increasing order then it drops to min value and then again elements are in increasing order.

How do you define rotation point ?
for any element if both left & right element are greater then current element is at rotation point.

In above case 5th element is at rotation point because both left(4th) and right(6th) are greater than 5th

Another example 
char[] values = {'i', 'j', 'k', 'v', 'a', 'b', 'c', 'd'}

5th element(i.e 'a') is at rotation point

Simple Binary Search will not work on this because 2 sorted array are joined in this.
If you try to solve this by brute force then have to go through each element till you find element that is less than previous.

Brute force solution will look something like below



Since this array is sorted , so it is possible to write hybrid binary search that can take advantage of sorted data.

Now real question is how to decided when to go left or right ?

If we just apply normal binary search logic then mid point is 'v'(i.e 4), now which element should be compared to decided left/right ?

Ah it is the first one ('i') , if mid is greater than first then go to right or if it is less than first then go to left.

Lets put some picture to understand it.




 Step 1 - v > i , so we go to right
 Step 2 - b < i, so we go to left
 Step 3 - mid value (a) is at rotation point and program ends.

Code will look something like below



Hopefully visualization was ok enough to explain how to approach this problem.
My visualization skills are very bad & i have to work on it.


Wednesday, 6 April 2016

Preparation of Coding Interview


I want to share my coding interview experience, so i have decided to write blog series about questions that are asked and this is first one in that series.

Coding interview are very common these days and if you find company that does not have it as part of selection process then i think you should stay away from them.

Lets start with why coding interview is required and reason is very simple especially in times when every company is adopting agile and they don't want to wait for months after hiring to know about coding skills of developer.

So that is the reason why it is the first gate that you have to pass as developer. 

Jeff Atwood share interesting experience about why-cant-programmers-program and type of question that is asked to filter out candidate. FizzBuzz is very good question to start with. 

I would advise to try FizzBuzz to make sure you are not surprised and think about testing also because we are in world of TDD


What types of question are asked

Different types of question are asked in coding interview and it depends on type of company you apply

 - Algorithms & Data structures

This is very common for Silicon valley company or product development company and some serious preparations is required for these types of interview if your daily work involves only playing with frameworks(i.e hibernate/spring etc). 

Things become more difficult for such interview because may developers(like me) are focused so much on learning emerging framework to build bleeding edge Resume that we tend to forget basic algorithm and start tagging ourself as some XYZ framework developer and company also help to the cause by putting such job requirement.

- Simple problems using TDD

This type of questions is also very common and it is make sure that you don't leave testing to QA team and write a code that is defect free , easy to understand and maintainable.

- Refactoring problems

You don't work on green field project every day and most of the time is spent in reading others code and adding features to it, so employer want to know "how do you quickly understand code written by somebody else and add new feature to it".

TDD becomes more important in such questions because it can be used to build all the safety net around existing code before you start changing it.

Above question can be asked as pair programming question or take a home type of question, so you have to build solution based on time given to you. 

Some of the things employer wants to check are

 - Simplicity/Readability of code.
 - Maintainability of code. 
 - Design principal usage.

Bottom line is you have to prepare for coding interview to survive in fast changing market.

Stay tuned up for questions.


Saturday, 27 February 2016

How to win friends and influence people

Recently i went on long vacation and took couples of books to read while i enjoy time with family.

One of the book that i read is How-Win-Friends-Influence-People by Dale Carnegie. This book is about human engineering by reading this book i came to know simple things can make big difference.

I like this book so much that i read it twice in 2 week.

This book is practical hand book for real life. Some of the principle mention in this book are

 - Don't criticize or condemn or complain.
This is the first approach that is used to fix the problem and it should never be used.

- Give honest sincere appreciation.
  He gives many example where act of appreciation change the way people look at life. 
Appreciation should be be confused with flattery(i.e cheap praise), few people likes flattery but it will do no magic as sincere appreciation will do.

People starve for appreciation same as food and it is legal tender that all soul enjoy but still we forget or avoid to appreciate somebody for years.

Dale Carnegie makes reference to below quote and i think it is one of the quote that should be read every morning.

"I shall pass this way but once any good there for that i can do or any kindness can show to any human being let me do it now , let me not defer or neglect it for i shall not pass this way again."


- Talk in terms of other people wants.
It is based on simple rule of fishing, you don't put the food that you like in the hook for fishing.

Book has many example on how to do it, some of the areas covered are below. I like the Kids example because i need is desperately. 

 - Kid not going to school
 - Feeding issue
 - Smoking issue with kid
 - Business negotiation
 - Job interviews.

Idea to make the person want to do rather than forced to do it by talking about what other person want and how he can get it.

Another quote by Henry ford refereed in book

"If there is any one secret of success it lies in ability to get the other person point of view and see things from that person angle and as well as from you own"



- Get genuinely interested in other people.

People are only interested in themselves in morning, noon & night and this was confirmed by analysis of telephone conversation to find which word is used most. 
It was word "I", it was used thousands of time in just 500 calls.

So to make friends we have to change our interest. One of the quote from book says 

"You can make more friends in 2 months by genuinely interested in other people than in 2 years trying to get other people interested in you."

- Smile
Mr Carnegie says that Smile makes us human and shares various experiment result on smile.

People likes to see smile and they smile back at you, it make us feel so much better. 
Smile brings some thing that is more valuable than $$$$, it is in terms of happiness and friendship. You are much more rich.

Smile Philosophy 

"The value of smile , it cost nothing but creates much as it enriches those who receives without loosing anything for those who gives and it happens in flash and memory last forever , none are so rich that get along with it and none so poor for its benefit. It creates happiness and it is nature best antidote for trouble."


I have long way to go in Human engineering and this book is definitely first step .

These are only few of the principal mention in book. I highly recommend this book, if you want to read only one book in 2016 then this is the book.


Friday, 11 December 2015

Mr Cool - Bloom Filter


HashCode based data structure are basis of many algorithms, it is used for fast look up , membership queries, group by etc.

HashCode is also used to build some interesting class of data structure. In this blog i will share one of such data structure that has many useful application in real world.

Take a example that you have million or billions of item and you want to keep track of distinct items.
This is very simple problem and can be solved by using HashMap but that requires huge memory depending on number of distinct items because it has to stores actual element also.

To make this problem little interesting we can put memory constraints on solution, so effectively we need space efficient solution to test membership of item.

Nothing comes for free in this world , so with less memory we have to trade off accuracy and in some case it is fine for e.g Unique Visitor on website, Distinct Page visited etc

First intuition to solve this problem by allocating N buckets and mark the bucket based on hashcode, no need to store actual value just mark slot index.







Quick question that comes to mind is what will happen in case of collision and in case of that our result will be wrong but it will be wrong by some percentage only and there are couple of ways improve result.

 - Allocate more buckets but use single hash function.
 - Use multiple hash functions and each of the hash function has its own dedicated bucket.

Multiple hash function options works very well , using that result can be around 96% correct !

So if we use multiple hash function then data structure will look something like below.















This data structure is called Bloom Filter and it maintains X bit vector and apply X hash function on input value to mark bits.

for checking if value already exists just check if bits are set to true or not by using multiple hash function.

In terms of memory requirement bit vector of X length is required and get more accurate result multiple hash based bit vector can be used. Quick math will give fair idea about memory requirement


Bloom filter is very compact in terms of memory and size can be controlled depending on requirement.
Memory can be further reduced by using compressed bit vector.

Trade Off
It is important to know trade off done to achieve the efficiency .

 - It is probabilistic data structure means result are not 100% correct but interesting thing about this data structure is that it can give FALSE POSITIVE but never FALSE NEGATIVE. Which makes it good fit for many practical application

 - Multiple hash function are used , so read/write performance will be based on hash function.

Application
- DB Query filter
One of the most common use case. Useful in reducing false query to DB

- Joins
Yes bloom filter provide alternate way to do joins especially in distributed system. So on one of the dataset(i.e smaller) build bloom filter and send it to other nodes for joining, since it is very compact in memory so transport to remote system does not adds big overhead.
One of the disruptive Big data processing system Spark is using it for joins

- Alternate to B-Tree
B-Tree can be also used for membership queries but if size of B-Tree becomes large enough that it can't fit in the memory then all the benefit is lost.
Another big data system Cassandra is using it for read request by having bloom filter type data-structure on top of data block to avoid IO operation

Another Open source in big data space Parquet is using it for fast filters.

- Partition Search
This is just another variation of above use case, Apache druid.io which is again into big data space for fast analytic is using it by partitioning data on ingestion by time and in each partition column has bloom filter type of index for fast filters.

- Safe Website filter
This is an interesting one, google chrome uses bloom filter to find if site is safe or not.


Simple implementation of Bloom Filter is available @ github
In this implementation each hash function has independent bit vector , but single common vector can be used for all the hash function.

Monday, 5 October 2015

Soft Skills - How to teach yourself

I have started reading Soft-Skills-software-developers-manual book and liked it, it is has lot of valuable information that can help in better management/planning of software development career.

All the chapters are well written but i liked chapter - "How to teach yourself" so much that i wanted to write blog about it and keep it for my future reference.

This chapter starts with what is typical learning strategy and i have to confess that i also use this simple and no so effective learning technique

- Pick a book
- Read it cover to cover and practice examples/samples given in it.

Problem with this approach is that it takes time and we might focus on lot of things that is actual not required upfront when you start working on it.

Sounds so much like water fall method :-(

John Sonmez shares his approach and it is worth reading it.  His 10-step process is pragmatic approach to learning.

- Step 01 - Get the big picture
This is about having 50,000 feet overview of topic that you want to learn , this will help in knowing how big is the topic and what are the sub topics under it.
This can be done by doing internet searches .

- Step 02 - Determine Scope
Once you know how big/complex is the topic then decided scope of your learning

Step 03 - Define Success
Since you know your scope , so it must to define success criteria , this helps in having clear picture in mind of what it will look like when you are done.

Step 04 -Find Resource
This is pretty simple given we can get tons of information from internet, find all the resource that will help like blogs/books/videos/podcast/training material etc.
You don't have to get hung up on the quality of resource, it will be filtered later.

Step 05 - Create Learning Path
This is very important part because there is natural progression for learning, so learning path should be like that, some examples can be taken from books how topics are split based on complexity

Step 06 - Filter Resource
By this time we know what are we going to learn and in what order, so filter resource that you have collected to fit your need.

Step 07 - Learn enough to get started
This is one of the step in which most of us get too much involved and loose the track. You have to avoid the temptation of going too much deep in this part. Learn only enough so that you can get started and explore the things.

Step 08 - Play around
This step is about learning by trying/experimenting things and then referring reference material to get answers of your question.

Step 09 - Learn enough to do something useful
This step is for doing deep dive and learn as much your want so that. You have to keep eye on success criteria while you are learning, so that your are on right track.

- Step 10 - Teach

Books mention below quote for this step

Tell me and I forget. Teach me and I remember. Involve
me and I learn.
—Benjamin Franklin

I highly recommend this book , it is worth spending time on this book to learn stuff that will help in building successful career.