Projects
Building a Distributed, Replicated, and Fault Tolerant File System - Java
- Implemented a Distributed File System in Java. The system comprises of Controller, Chunk Server & Client.
- The files are stored in chunks of 64KB with a replication level of 3. The system adds fault tolerance by
constantly checking and fixing errors.
Scalable Server design using Thread pools and load balancing on the server - Java
- Designed and implemented Threadpools to handle load on the server, without using 3rdparty libraries.
- Observed that mean per client throughput X No of active connections equals server throughput. Std.
- Deviation of per client throughput is low.
Blog Creation with NoSQL database and Bottle Framework
– Create Schema for storing articles, comments, username, and password in MongoDB. Pymongo is used to access MongoDB.
– Bottle framework and templates are used for presenting it on web. Text based indexes were created to search based on tags.
Twitter Sentiment Analysis
– Used Python and OAuth library to get a sample set of tweets. Map reduce is used for processing.
– Calculate the sentiment score for each tweet by comparing the individual words with the pre-computed sentiment scores. Tweets in
English language alone are considered.
– Derive sentiment for new words using score of the tweet. Find frequency of each word, happiest state
& top ten hash tags. The user field is used to get the “state” of the user.
Training Support Vector Machine for Spam E-Mail Classification
–Pre-processing of e-mails and word stemming is done. Words occurring more than 100 times are considered.
–Features from email are extracted into a vector. The SVM is trained and gets a 99.8% training accuracy and 98.5% test accuracy.
Anomaly Detection and Recommender Systems
–Implement Gaussian distribution to detect anomalous behavior of server. F1 score is used to choose best threshold value.
–Collaborative filtering learning algorithm is applied to predict the movie ratings. Used Octave programming language.
K-Means Clustering and Dimensionality reduction
Image compression with K-means:
Reduce the number of colors and cluster them into 16 colors.
Cluster Centroids are computed and 16 colors are replaces the pixels in original image. Image is compressed by a factor of 6.
Dimensionality reduction - Application:
Feature vectors of the data set are normalized. Principal Component Analysis is done to reduce the dimensions. Covariance matrix is computed and singular value decomposition is applied.
The data set X is projected onto the K principal components to create reduced dimensional dataset Z.Also the approximate data is recovered from the reduced dimensions.
This is applied is applied to faces dataset in which has a 32x32 matrix corresponding to original dataset. It is then converted to 100 dimensions (10x reduction) and reconstructed back.
Recognition of Handwritten digits(0-9) using Logistic Regression & Neural networks
–Implemented a basic one-vs-all classifier with multiple regularized logistic regression classifiers.
–Implemented a Neural network scheme that uses back propagation to learn the parameters and use feedforward propagation to predict the digits.
Computed the performance by measuring accuracy for both models. Octave was used for implementation.
Estimating PageRank Values of Wikipedia Articles using MapReduce
–Implement page rank algorithm to rank the internal Wikipedia articles with Wiki data dump.
–Analyze the estimated the page rank values under ideal condition as well as for dead-end articles.
–Used Java map reduce and Hadoop framework for this.
Packet Analysis and Invalid Packet Capture
–Generate packets using Google Ostinato and inject in eth/wlan interface.
–Used NQUEUE (Libnetfilter library) to analyze & capture packet
Simulation of Network Interface Card (NIC) with offload capability
–Implementation involves message & packet Buffers. Message arrival rate is defined by Poisson process. Java programming language was used.
–Calculate throughput, efficiency and packet drop rate for each preset buffer size.
Content Searching in a Distributed Application layer Network – Structured & Unstructured
–Implemented Unstructured and Structured (Chord Algorithm) P2P network and simulated on 80 nodes. Java is used for implementation of both structured and unstructured network.
–Structured network construction and file searching based on TCP protocol. Unstructured network uses UDP as Transfer Unit.
–Computed Std. Deviation of finger table size & per query cost in terms of latency, hops for structured network.
–Computed Std. Deviation of node degree, packets, latency, hops for all the nodes and plotted CDF for unstructured network.
Undergraduate Project - Intelligent home automation system – Non-conventional
–Bluetooth controlled central microcontroller & Utility microcontrollers connected through RFId.
–Sensors to relay information from surrounding environment & solar charged battery powers up the appliances.