Thursday 6 August 2015

Social Media Analytics using R + Hadoop

Social media Analytics Using R + Hadoop (RHadoop):
This article is about an idea of doing analytics using RHadoop. For the domains like bio medical, research and analysis of educational institutions , Statistical computing we use R to find out different patterns , prediction analysis and more insights from the data. If suppose data is limited and its usage are nominal then we can do those analyses with R. But think of scenarios where data is going to be huge and in terms of peta bytes.
I here am plotting a diagram which will show the view to inculcate R with hadoop and social media analytics.



                                          Fig (1):    R Hadoop with Social Media analytics

RHadoop Set up and Installations:-
--> Setting up of R in your system, the latest one R 3.1.3 with the required packages that we work on. Check this for installations
Refer -->"http://cran.r-project.org/bin/windows/base/"

-->Setting up of Hadoop system in single node or multinode cluster.
Referàhttp://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node- cluster/

RHadoop Use Cases

--> Coming to the Use Cases of RHadoop ,its presence in two ways .one with the streamed data (Like the Social Media Sites and news feeds from different Sources )and one with the data that resides in the Standard Traditional or NOSQL DBs (like MongoDb).

Coming With the Social media Analytics using RHadoop we have the following setup
--> Hadoop setup with R running on it
--> API s to connect with different social media like Linkedin,Facebook,Twitter.
--> Packages to be loaded must in R be ( ROAuth, twitteR, RLinkedin,RCurl )

Key User Case for Streaming Data be Like : 
R <------> Twitter and fetching tweets and slice and dice the fetching data
R<-------> Linkedin .Connecting with Linkedin and getting data and slice and dice it.
Similar way we can do with FB and Instagram.

The Second User Case be like:
R <-----> MongoDb. Fetching the documents and applying logic on the fetched documents and 
performing the analytics.

As of now there is No Parallel distribution supporting with R as a standalone.
But with some Distributions its comes up with Parallel distribution.


No comments:

Post a Comment