Sunday, March 24, 2013

Identifying Classes using modelling

Identifying the right classes for a System is an Art rather than a skill.Modelling too many classes results in too complex code which is difficult to maintain and may also result in poor performance.On the other hand if there are too few classes then it may result in tight coupling.Striking a balance between these provides the basis for a good design.
When designing a system,always believe that the best design is yet to come.Revisiting and Refining the proposed design will always aid to carve out the best design.The domain knowledge is essential key factor to come up with a good design.
Since it may not be feasible for a Designer to be a domain expert right from the begining ,there are some design guidelines to identify the classes.The most commonly used approaches are
1.Noun/Noun Pharses approach:
In this approach ,all the noun and noun phrases are identified from the requirements.
2. CRC card approach:
Using the Class Responsibility Collaborator approach,address a single requirement ,identify the similar objects and group them into a class .Add the responsibility and collaborations for this class.Extending the same to each requirement will render a list of Candidate classes.
Using these approaches,the list of candidate classes can be identified.From the list remove the classes which are duplicate and those which are not related to the system.On analysing the classes,if you find that the class does have significant purpose then define it as an attibute instead of a class .At the end of this analysis a number of candidate classes would be eliminated and the design classes would be derived from this list.
As the design progresses its very common to find that a few of the classes which were identified as the core classes may not be actually required .Having said that it is also possible to come up with totally new classes which were missed in the initial candidate class.
The right set of classes would eventually emerge after a number of interative and incremental cycles.

Saturday, January 19, 2013

Apache Hadoop

Apache Hadoop is an open source Java framework for processing large amount of distributed data.Hadoop is a Yahoo! initiated project to enable processing of very large data sets in a cost effective way.
It is based on Google's MapReduce algorithm which is a parallel and distributed solution approach for processing large datasets. MapReduce is utilized by Google and Yahoo for their websearch.Using MapReduce,data is transformed into a set of Key/Value pairs and later these are aggregated to provide the search results.
Data in Hadoop is broken down into smaller blocks and distributed throughout the cluster.This aids to execute the MapReduce functions on smaller subsets of the large data set.Apart from Java,Hadoop also provides linker for  C++ ,Perl,etc.
Hadoop is ideal for storing large amounts of data, like terabytes and petabytes.The Hadoop Distributed File System(HDFS) is employed as the storage system which is highly fault-tolerant and is designed to be deployed on low-cost hardware.It manages storage on the cluster by breaking incoming data into pieces, called “blocks,” and storing each of the blocks redundantly across the pool of servers. By default,
HDFS stores three complete copies of each file by copying each piece to three different servers but the number of copies is configurable.
Hadoop is very flexible,scalable  and can process large amount of data in parallel .This can be deployed on large clusters of commodity hardware,hence it is very cost effective.
Its ideal to use Hadoop  when there is a need to analyze enormous amount of information to understand the usage pattern and to predict demand.