PIG and Big Data – Processing Massive Data Volumes at High Speed

For most organizations, availability of data is not the challenge.  Rather, it’s handling, analyzing, and reporting on that data in a way that can be translated into effective decision-making.

PIG is an open source project intended to support ad-hoc analysis of very large data volumes. It allows us to process data collected from a myriad of sources such as relational databases, traditional data warehouses, unstructured internet data, machine-generated log data, and free-form text.

How does it process?

PIG is used to build complex jobs behind the scenes to spread the load across many servers and process massive quantities of data in an endlessly scalable parallel environment.

Unlike traditional BI tools that are used to report on structured data, PIG is a high level data flow language which creates step-by-step procedures on raw data to derive valuable insights. It offers major advantages in efficiency and flexibility to access different kinds of data.

What does PIG do?

PIG opens up the power of Map Reduce to the non-java community. The complexity of writing java programs can be avoided by creating simple procedural language abstraction over Map Reduce to expose a more Structured Query Language (SQL)-like interface for big data applications.

PIG provides common data processing operations for web search platforms like web log processing. PIG Latin is a language that follows a specific format in which data is read from the file system, a number operations are performed on the data (transforming it in one or more ways), and then the resulting relation is written back to the file system.

PIG scripts can use functions that you define for things such as parsing input data or formatting output data and even operators. UDFs (user defined functions) are written in the Java language and permit PIG to support custom processing. UDFs are the way to extend PIG into your particular application domain.

PIG allows rapid prototyping of algorithms for processing petabytes of data. It effectively addresses data analysis challenges such as traffic log analysis and user consumption patterns to find things like best-selling products.

Common Use Cases:

Mostly used for data pipelining which includes bringing in data feed, data cleansing, and data enhancements through transformations. A common example would be log files.

PIG is used for iterative data processing to allow time sensitive updates to a dataset. A common example is “Bulletin”, which involves constant inflow of small pieces of new data to replace the older feeds every few minutes.

Sailaja Bhagavatula specializes in SAP Business Objects and Hadoop for Bodhtree, a business analytics services company focused on helping customers get maximum value from their data.  Bodhtree not only implements the tools to enable processing and analysis of massive volumes of data, we also help business to ensure the questions being asked target key factors for long term growth.

Read More

What BI Can and Cannot Do for Growing Midsize Enterprises

As the volume and complexity of data expands, it affects, transforms and disrupts the way midsize companies conduct business. As these companies grow, there is also a transformation in the way employees interact with that data: There are more users accessing it, and they want to access the data in different ways than ever before.

For instance, there are inventory managers who need self-service reporting and cannot wait weeks for reports. There are sales managers who need to drill up or down into sales by territory, SKU and product family. There are marketing managers who want to analyze campaigns against prior results, but historical data is not retained. There are finance pros who need to identify and prioritize problem areas on a real-time basis. And then there are IT managers who must enhance reporting environments but lack the staff, budget, and time and expertise to deploy and maintain BI.

And that’s probably just a partial list of what’s happening at your company.

Today’s BI software can offer help for midsize companies grappling with these growth challenges. However, there are many things BI software can’t do. Setting clear expectations and knowing the difference between the two will help midsize companies get value from their BI software purchases. By providing important capabilities for data integration, quality and master data management, those companies that start an EIM and BI project will provide users access to trusted information and overcome silos, data inconsistencies and hidden assets.

BI provides business users with access to the information they’re looking for via several different types of reports—including managed, ad-hoc, analytics, dashboards, production and operational reporting. Users can have it any way they like today, giving them a range of insights for managing performance and the ability to:

• Access and transform corporate data into highly formatted reports for greater insight
• Leverage dashboards to visualize data trends for better decision-making capabilities
• Analyze historical data trends from complex historical data and to improve forecasts
• Interact with information for fast response to ad-hoc questions
• Find immediate answers to business questions
• Deliver information across the organization for an empowered mobile workforce

While these advantages provide critical insights to help guide business growth, organizations cannot expect BI to be a magic bullet to resolve all issues. We advise caution in order to build the best solution and set more realistic expectations for users. Remember: Tools alone will not transform your business.

Three Things BI Can’t Do for Midsize Enterprises

1. Solve All the Problems at Once

Focus on top business and IT issues—and then prioritize from there. A clear, tangible roadmap will help determine what can be accomplished—both in the short and long term. A phased approach based on common priorities across business units is critical.

For example, Bodhtree recently helped a global provider of broadband communication in the high-tech industry with an assessment of their enterprise readiness for business analytics. We determined where the organization currently was in the analytics lifecycle—such as, what was the current state in data warehouse technology environment, sources of information and overall architecture. Then we outlined the appropriate steps (short-term, mid-term and long-term) needed to make analytics work for their organization.

The results were tangible and actionable—a strategic roadmap mapped the overall picture and health of analytics within the organization. The roadmap also identified how to map out the business drivers to resolve their business and IT needs and challenges, as well as key pressing issues. Within weeks, their organization understood what hardware they needed to procure, what software solution would work for their organization, what issues they needed to resolve first for their line of business groups, and most importantly, what level of effort and confidence was required to execute all plans.

2. Resolve All Data Integrity Issues

BI can help identify, extract, profile and analyze where data quality problems exist. But it’s up to the business to make changes through a clear business process.

For example, Bodhtree implemented a world-class business analytics solution for a global consumer electronics leader. Key transactional data were integrated from a myriad of sources: demand generation, order management, web click stream, promotions, phone switches, campaign management, products, discounts or coupons, returns and warranties, POS and service centers. Yet, troubling issues surfaced related to the business process in campaign generation and management. The BI solution was able to identify the data quality problems, but it was up to the organization to revisit the way campaigns would be generated and segmented at their initial conception to enable a more efficient and effective way to track performance and ROI.

3. Ensure User Adoption

An “if you build it they will come” approach won’t satisfy all users. It’s important to include users in the BI planning process, so they will eventually become BI evangelists.

A prime example was a “single source of information” solution Bodhtree delivered for a global medical device maker that enabled them to have an accurate, timely, trusted, integrated view of their business. A critical part of this business analytics project was the importance of the device maker’s organizational readiness, which was highly emphasized right from the start. Bodhtree made certain to involve the business users at various stages in the lifecycle of the solution implementation (from strategy, assessment, tool selection, design and implementation). This allowed some of the early adopters to become change agents and analytics evangelists to help ensure other users adapted and accepted the new platform and solution, and ultimately led to the success of the project.

Many BI software vendors have recognized the value in the midmarket (including SAP) and are offering attractive pricing on solutions that meet traditional business needs. There are numerous BI solutions available to SME customers today; some claim to resolve “problems” within weeks. However, they may not be able to scale with your business growth, nor flex with your changing needs. Choose wisely.

Kain A. Sosa is Vice President of Analytics at Bodhtree, a global provider of IT business consulting and solution integration services headquartered in Fremont, Calif. Bodhtree maximizes customer potential by offering value-added solutions around four key areas of focus: Analytics, Cloud, Mobility and Enterprise services. Bodhtree is a gold channel partner in the SAP PartnerEdge™ program and a master value-added reseller of SAP BusinessObjects Business Intelligence (SAP BI) solutions, with extensive expertise in delivering BI solutions built on SAP technology.

Published in ASUGnews on April 3,2012 http://www.asugnews.com/2012/04/03/what-bi-can-and-cannot-do-for-growing-midsize-enterprises/

Read More