Sunday 13 January 2013

The Next Big Thing : Big Data

        Data is an important asset of an organization. Every organization is generating data in one or the other way. For any damn business decision they use the data to back it. When the amount of data increases, managing and processing of such a huge amount of data with traditional technologies becomes challenging.
  Big data is the any amount of data that we can think of and the problems that associated with managing that data.

 When an organization has few Giga Bytes and Tera Bytes of data, there will not be any major problems in managing that amount of data in currently existing the technologies like Databases or standard storage devices.
When the data size grows exponentially it becomes the very difficult  to provide the reliable, easily scalable and cost effective solution using the traditional data technologies.

  To provide the reliable, scalable and cost effective solution one has to think in terms of using the commodity hardwares for the storage. when I say commodity hardware it means cheap,low end, non server class machines for storing the data.
But how can we rely on cheap machines to store the important data? If one machine goes down, data hosted by that particular will not be available. No?
Yes,  cheap, low end, non server class machines are less reliable than the higher server class machines.
 But, one should build a system such that, even if one of the machine hosting the data goes down, data should be available from the other machine.
      You may be wondering how the data will be available if machine hosting the data goes down.
If so, you are thinking right. Yes, I'm talking about the replication. Replicate the data so that data will be available in other machines as well.
     System should be such that one should be able to add the machine for storage and total storage space of the system should increase. This is called the horizontally scalable system.
    Coming to the cost, yes we are talking about the commodity hardwares so even if hardware goes bad, just throw away that machine and use the add the new one.
 As the web based services are increasing, more and more popular organizations are  trying to capture the user activities and deduce the behavior over time, so that they can target some services or ads.
    Now everyone is realizing the power of data. No doubt it will be the next biggest thing in the industry along with cloud.

