The Opportunity for Excellence

When working with clients or prospects, I always relished and appreciated an “Opportunity for Excellence“. What does that mean? It means that when a client or prospect calls with a specific problem that is time sensitive and we, as an organization and team, jump on the problem and work to solve it.

We have had several “Opportunities for Excellence” in the past few months’ with our clientele. On two of them we worked through the Christmas holidays. We have had numerous times where we had several of our colleagues pull “all-nighters” just to respond to a client’s needs. We have met deadlines, pushed ourselves, strived to turn out the best work we could do.

This excellence at my organization can only be exhibited by the members i.e. our employees. When will you have an “opportunity for excellence”? When will you be called upon to deliver success in impossible timeframes and conditions?

Let me end by telling you this. When you go the extra mile for your customer and succeed, you go from being a vendor to being a strategic partner.

Phil Hodsdon, SVP, Sales & Solutions at Bodhtree.

Read More

What is Hive? Its Interaction with Hadoop and Big Data

Hive – A Warehousing Solution Over a MapReduce Framework

What is Hive?

Hive is a data warehousing infrastructure built on top of apache Hadoop.

Hadoop provides massive scale-out and fault-tolerance capabilities for data storage and processing (using the MapReduce programming paradigm) on commodity hardware.

Hive enables easy data summarization, ad-hoc querying and analysis of large volumes of data.

It is best used for batch jobs over large sets of immutable data (like web logs).

It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to easily perform ad-hoc querying, summarization and data analysis.

At the same time, Hive QL also allows traditional MapReduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the languag

Hive Query Language capabilities:

Hive query language provides the basic SQL like operations. These operations work on tables or partitions.

  • Ability to create and manage tables and partitions (create, drop and alter).
  • Ability to support various Relational, Arithmetic and Logical Operators.
  • Ability to do various joins between two tables.
  • Ability to evaluate functions like aggregations on multiple “group by” columns in a table.
  • Ability to store the results of a query into another table.
  • Ability to download the contents of a table to a local directory.
  • Ability to create an external table that points to a specified location within HDFS
  • Ability to store the results of a query in an HDFS directory.
  • Ability to plug in custom scripts using the language of choice for custom map/reduce jobs.

Major Components of Hive and its interaction with Hadoop:

Hive provides external interfaces like command line (CLI) and web UI, and application programming interfaces (API) like JDBC and ODBC

(click to enlarge)

The Hive Thrift Server exposes a very simple client API to execute HiveQL statements. Thrift is a framework for cross-language services, where a server written in one language (like Java) can also support clients in other languages.

The Metastore is the system catalog. All other components of Hive interact with the Metastore.

The Driver manages the life cycle of a HiveQL statement during compilation, optimization and execution.

The Compiler is invoked by the driver upon receiving a HiveQL statement. The compiler translates this statement into a plan which consists of a DAG of map/reduce jobs.

The driver submits the individual map/reduce jobs from the DAG to the Execution Engine in a topological order. Hive currently uses Hadoop as its execution engine.

What Hive is NOT

Hive is not designed for online transaction processing and does not offer real-time queries and row-level updates.

Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

Hive Applications:

  • Log processing
  • Text mining
  • Document indexing
  • Customer-facing business intelligence (e.g., Google Analytics)
  • Predictive modeling, hypothesis testing

Vijaya R Kolli is a Hadoop-Big Data developer with Bodhtree.  Bodhtree specializes in BI and Big Data solutions, providing analytics consulting, implementation, data cleansing, and maintenance services.

Read More

Big Data – How does it impact an Organization?

Business Intelligence

“Data will grow to several petabytes and zeta bytes in the next few years;” “Corporations will be overwhelmed by the sheer volume of data;” “By 2015 databases will grow so large that companies will have to rent space on the moon to store them.” These are just a few common quotes we hear from renowned industry analysts, experts and CIOs surrounding the topic of Big Data.

But for many observers, there’s still a lot of gray area surrounding when to apply Big Data solutions.  How is Big Data different from traditional analysis tools?  How does it impact an organization? Just because there are several terabytes of data, does that make it Big Data? Does it really matter?

We can make a generic bold statement here: Data is everywhere! When you pause to think, you’ll see it’s true. The cars we drive, stores we shop at, the phone we use, the websites we read, the social media we so closely embrace, the TV programs we watch, the election polls that we follow – everything generates data and it is all stored somewhere.

When it comes to Big Data, people commonly mention the 4 Vs, but this only scratches the surface.  Traditionally, any large organization that implements ERP services has all these Vs associated with their everyday line of business.  Admittedly, this is primarily “structured” data, and it can be handled by a traditional RDBMS or an MPP database.  But in light of the other Vs, it technically has volume because everything is recorded in an ERP system; the incorrect and the correct entries alike form the millions and billions of rows.  It undoubtedly has value; otherwise corporations would not bother to collect it. It clearly is volatile because data does not come at regular intervals. It also has variety; data comes in from multiple sources, such as CRM systems, HR systems and modules that are implemented within the organization.

Even if you add unstructured data, can you really derive strategic value for your business? What is this unstructured data comprised of? What kind of metrics can we extract from this unstructured data?

More often than not, Big Data buzz induces a cool factor akin to what was once associated with owning an iPhone/iPad or, back in the day, having a free Facebook account. Sometimes a personal agenda is associated with Big Data implementations. Several managers and CIOs have vested interests in showing off a ‘cutting edge’ Big Data implementation.  We recommend organizations take a breather to make sure that the data problem they have qualifies as a Big Data problem.

…Stay tuned for a model CIOs can use to evaluate whether Big Data is the solution to their challenge…

Ajay Narayan is a presales solution architect for Bodhtree, a leader in data analytics and product engineering.  Bodhtree partners with SAP and Informatica to assist companies in leveraging enterprise data to gain a competitive edge in the market.  Bodhtree also offers the MIDAS product suite, which migrates, cleanses and integrates data between SalesForce, SAP and legacy platforms.

Read More

Navigating the Big Data Tsumani

Business Intelligence

Having just returned from a Big Data conference about the state of our technology ecosystem with respect to the explosion of data, a thought came to mind that history repeats itself. You may wonder, since we’ve never faced data challenges of this magnitude or complexity, what insights can history offer?

When faced with previous tech tsunamis like www in the late 1990s or smartphones in the mid-2000s, many customers and vendors joined the bandwagon trying to establish themselves as key players in the space. With a lot of money spent, only a few emerged as true game changers while most remained spectators in the technology revolutions.

As a vendor if you recognize this time period as a tsunami of Big Data-enabled capabilities being unleashed on the market, it is important to decide early on what your role will be. Do you want to simply ride the wave and hope for a spot on a lucrative customer budget, or will you identify the core value that you and your products bring to the market and invest in it over the next 12-24 months?

The value that you bring could be in any form, e.g. build an army of Data Scientists for the market or provide a product to solve a Big Data problem. In this tsunami of capabilities there are certain areas where demand will constantly remain high, such as Data Cleansing. Whether data is structured or unstructured, the need for quality input to any Big Data solution is critical to generate valuable output. My recommendation to the vendors playing in the Big Data revolution is to quickly define your role, build your space and stick to it. Avoid saying “We are a Big Data vendor” – it will only sound superficial and lacking in real value. Be more specific about how you and your company will make your mark in this Big Data tsunami of capabilities.

Manju Devadas is VP of Solutions and Technology for Bodhtree, a leader in data analytics and product engineering.  Bodhtree partners with SAP and Informatica to assist companies in leveraging enterprise data to gain a competitive edge in the market.  Bodhtree also offers the MIDAS product suite, which migrates, cleanses and integrates data between SalesForce, SAP and legacy platforms.

Read More