Big Data
What is Big Data?
The term Big Data refers to a process. Most of us are all familiar with large datasets running into Terrabytes or Petabytes (such as weather data, videos stored at Netflix, the crawled web pages or map data stored at Google). So Big Data does not mean that you have a lot of data or that you have a big database.
So what process are we talking about? Big Data is the process of driving business value through implicit knowledge gained from analysing the data you have stored throughout your organisation, whether the data is in raw format sitting on a file server somewhere or sitting within a managed database system. It is different from business intelligence in the sense that much of the value from business intelligence comes from known and structures sources leaving much of the unstructured data being ignored. Business intelligence requires a well structured data warehouse containing known faces or measures and kpi's. Deriving business value from business intelligence is achieved through whatif analysis and through slicing and dicing the data by viewing it in different perspectives to gain insight on business performance. Big Data is far more complex and analysis of big data is not so straight forward.
Analysis of bigdata is not done directly from a business point of view (such as revenue, expense, profits, margins, etc...) but from a customer and customer behavior point of view. This customer orientation is why bigdata has so much value to organisations. But unless an organisation understands the right concepts and processes, most bigdata projects are likely never to achieve their true value. Many bigdata projects fail to make it past the prototype phase because the type of analysis performed is done based on traditional methods comfortable by financial analysts and the CFO. Unless you begin to think outside the box and try to understand your customers, bigdata will just remain a hype and something nice to have on your employees resume.
Big data is often described by using four v's. But we prefer to call it the five v's because without the fifth v (value), a big data project is just a waste of resources and money. The five v's of big data are:
- Volume: There is more data stored outside your database that you can ever imagine. Anything from web logs, mobile GPS signals, phone calls, conversations (email, twitter, sms, etc...) to the customers behaviour when controlling their mouse, eye movements, to keystrokes. The volume of big data runs into petabytes even in the smallest of organisations. A single customer can generate terrabytes of data in a single day but most of this data is not captured and is lost in short term memory.
- Velocity: The speed of which the data comes is extremely fast. Imagine how quickly the transactions are processed in the share market? Companies make billions through High-frequency trading where a fraction of a second can make or break your bank balance. Your web server logs can generate thousands of entries per second... imagine analysing all the network traffic on top of that or superimposing that data with demographics or behavior patterns?
- Variety: Data can come in so many different forms. Emails, logs, documents, crawled web pages (competitor analysis), electrical signals, photos, videos from security cameras... and these are just a few examples. Even the background noise is considered to contain useful data for processing to measure business performance. There is no limit to the kind of data that we can use to derive information.
- Veracity: The quality or trustworthiness of data can also differ depending on the source. For example, twitter can be considered poor quality data, Facebook can contain unreliable data (fake accounts), but web logs are considered almost concrete facts that can even be used as evidence in court. Financial transactions from banks are considered to be better and more trustworthy sources of data.
- Value: There is a lot of value that can be derived from big data. But unless you collect the right data, at the right time, and do the proper analysis... it will unlikely deliver any value.
What tools are available?
There are several free and open source tools that can help you analyze big data. Please check out our page on open source for a list of tools that we recommend for databases and big data technology.
How will it help my business?
Leveraging Big Data and analytics can drive innovation and uncover potential for disruption. Big Data represents an enormous opportunity for businesses looking to differentiate themselves and the remain relevant. Organisations need to analyse the data within and outside their boundaries. Analysis of large volumes of data needs to be turned into a business decision through insight. Adapting rapidly to customer demands is important in any industry not just retail. Applications of big data include social networks, relationship visualisation, search indexing, medical records, call detail records, the sciences, archives, and large-scale e-commerce. For example, a bank can optimise their interest rate based on buying behaviour and individual risk factor. Insurance companies adjust their insurance premiums based on risk carried by an individual or the location and type of motor vehicle. There are so many hidden opportunities yet to be discovered in unstructured data and how it can be used differs for each industry.
How can we help you?
Dewlock are experts in the Big Data field. We have helped organisations process terrabytes of data within a few minutes. We are experienced in deploying and managing thousands of nodes in a single custer and can help with near-real-time map-reduce based analysis using open source. Small organisations can also benefit from our services as we can host and process the data for you with no investment in hardware. Contact us if you need our help on your big data project.
Our Big Data process consists of five different phases:
- Collection: If it is not collected, it cannot be analysed. We will help you collect the data by setting the appropriate data collection services in your organisation so that we have the right data we need to analyse.
- Storage: To store a large volume of data, we utilise cloud technology or open source NoSQL based databases. We have clusters of storage servers capable of storing petabytes of data at your disposal.
- Analysis: During this phase, we clean the data then perform the analysis using Hadoop, Mahout and custom modules that we develop depending on your business and the type of analysis we need to perform.
- Visualisation: We use several different interactive visualisation tools so that you can see what the data is telling you.
- Action: Our analytics will give you sufficient information to improve your business through specific and measurable actions.