Thursday 6 August 2015

Social Media Analytics using R + Hadoop

Social media Analytics Using R + Hadoop (RHadoop):
This article is about an idea of doing analytics using RHadoop. For the domains like bio medical, research and analysis of educational institutions , Statistical computing we use R to find out different patterns , prediction analysis and more insights from the data. If suppose data is limited and its usage are nominal then we can do those analyses with R. But think of scenarios where data is going to be huge and in terms of peta bytes.
I here am plotting a diagram which will show the view to inculcate R with hadoop and social media analytics.



                                          Fig (1):    R Hadoop with Social Media analytics

RHadoop Set up and Installations:-
--> Setting up of R in your system, the latest one R 3.1.3 with the required packages that we work on. Check this for installations
Refer -->"http://cran.r-project.org/bin/windows/base/"

-->Setting up of Hadoop system in single node or multinode cluster.
Referàhttp://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node- cluster/

RHadoop Use Cases

--> Coming to the Use Cases of RHadoop ,its presence in two ways .one with the streamed data (Like the Social Media Sites and news feeds from different Sources )and one with the data that resides in the Standard Traditional or NOSQL DBs (like MongoDb).

Coming With the Social media Analytics using RHadoop we have the following setup
--> Hadoop setup with R running on it
--> API s to connect with different social media like Linkedin,Facebook,Twitter.
--> Packages to be loaded must in R be ( ROAuth, twitteR, RLinkedin,RCurl )

Key User Case for Streaming Data be Like : 
R <------> Twitter and fetching tweets and slice and dice the fetching data
R<-------> Linkedin .Connecting with Linkedin and getting data and slice and dice it.
Similar way we can do with FB and Instagram.

The Second User Case be like:
R <-----> MongoDb. Fetching the documents and applying logic on the fetched documents and 
performing the analytics.

As of now there is No Parallel distribution supporting with R as a standalone.
But with some Distributions its comes up with Parallel distribution.


Wednesday 5 August 2015

Big Data Core Indicators and Key Aspects

Big Data Core Indicators :



As we all talking about big data the core indicators that comes into picture are four V's.
Volume,Velocity,Variety and Veracity.
These V's are going to decide the big data and its future. Technically big data comes into picture when ever an organization or company only deals about any of these V's.

Big Data Core Indicators


Key Aspects of Big data Platform

1. Integration --The point is to have one platform to manage all of the data. Big data has to be             bigger than just one technology.
2. Analytics   — A very important point. We see big data as a viable place to analyze and store  data. sophistication and accuracy of the analytics matters.
3. Visualization --   Need to bring big data to the users.
4. Development — Need sophisticated development tools for the engines and across them to              enable the market to develop analytic applications.
5. Workload optimization — Improvements upon open source for efficient processing and                    storage.
6. Security and Governance — Sensitive data that needs to be protected, retention policies need to be determined .


As Technology advancements  day by day the amount of data that dealing with the business requirements also increasing. So Big data analytics and solutions providing better and enhanced solutions to solve business problems in different industrial verticals.

Big Data At Glance

 Big Data At Glance 

The big data ecosystem can be confusing. The popularity of “big data” as industry buzzword has created a broad category. As Hadoop steamrolls through the industry, solutions from the business intelligence and data warehousing fields are also attracting the big data label. To confuse matters, Hadoop-based solutions such as Hive are at the same time evolving toward being a competitive data warehousing solution.
Understanding the nature of your big data problem is a helpful first step in evaluating potential solutions. Let’s remind ourselves of Big Data.

“Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."

Big data problems vary in how heavily they weigh in on the axes of volume, velocity and variability. Predominantly structured yet large data, for example, may be most suited to an analytical database approach.                                                  



This survey makes the assumption that a data warehousing solution alone is not the answer to your problems, and concentrates on analyzing the commercial Hadoop ecosystem. We’ll focus on the solutions that incorporate storage and data processing, excluding those products which only sit above those layers, such as the visualization or analytical workbench software.

Getting started with Hadoop doesn’t require a large investment as the software is open source, and is also available instantly through the Amazon Web Services cloud. But for production environments, support, professional services and training are often required