PIG and Big Data – Processing Massive Data Volumes at High Speed

For most organizations, availability of data is not the challenge.  Rather, it’s handling, analyzing, and reporting on that data in a way that can be translated into effective decision-making.

PIG is an open source project intended to support ad-hoc analysis of very large data volumes. It allows us to process data collected from a myriad of sources such as relational databases, traditional data warehouses, unstructured internet data, machine-generated log data, and free-form text.

How does it process?

PIG is used to build complex jobs behind the scenes to spread the load across many servers and process massive quantities of data in an endlessly scalable parallel environment.

Unlike traditional BI tools that are used to report on structured data, PIG is a high level data flow language which creates step-by-step procedures on raw data to derive valuable insights. It offers major advantages in efficiency and flexibility to access different kinds of data.

What does PIG do?

PIG opens up the power of Map Reduce to the non-java community. The complexity of writing java programs can be avoided by creating simple procedural language abstraction over Map Reduce to expose a more Structured Query Language (SQL)-like interface for big data applications.

PIG provides common data processing operations for web search platforms like web log processing. PIG Latin is a language that follows a specific format in which data is read from the file system, a number operations are performed on the data (transforming it in one or more ways), and then the resulting relation is written back to the file system.

PIG scripts can use functions that you define for things such as parsing input data or formatting output data and even operators. UDFs (user defined functions) are written in the Java language and permit PIG to support custom processing. UDFs are the way to extend PIG into your particular application domain.

PIG allows rapid prototyping of algorithms for processing petabytes of data. It effectively addresses data analysis challenges such as traffic log analysis and user consumption patterns to find things like best-selling products.

Common Use Cases:

Mostly used for data pipelining which includes bringing in data feed, data cleansing, and data enhancements through transformations. A common example would be log files.

PIG is used for iterative data processing to allow time sensitive updates to a dataset. A common example is “Bulletin”, which involves constant inflow of small pieces of new data to replace the older feeds every few minutes.

Sailaja Bhagavatula specializes in SAP Business Objects and Hadoop for Bodhtree, a business analytics services company focused on helping customers get maximum value from their data.  Bodhtree not only implements the tools to enable processing and analysis of massive volumes of data, we also help business to ensure the questions being asked target key factors for long term growth.

Read More

Extending SAP Business Objects to All Organizational Decision-Makers

BI tools play a vital role in decision making and innovation at every level in dynamic organizations. SAP Business Objects includes tools that help expand the reach of BI Information services, enabling the organization to share, integrate and also Embed BI in applications, services, tools and business processes.

Unification of the BI data used across applications

BI can be used across multiple functions and is generally not specific to any particular department or team. It can be leveraged across applications related to finance, operations, sales or human resources. SAP Business Objects Enterprise provides a unified structure with a powerful semantic layer and integration capability that brings a “single version of the truth” to data drawn from multiple sources.

Share BI with any service-Enabled Application

To build applications that extend the advantage of a company’s BI Investment, developers can use SAP Business Objects BI Software Development Kits (SDK’s). These SDK’s can be used in any Java or .Net based application for authentication, authorization, scheduling, content display, ad hoc query, or server administration. SAP Business Objects also offers a comprehensive set of Web Services that expose BI functionality as a platform-agnostic interface. The software also supports your organization by extending the reach of BI beyond traditional corporate business.

SAP Business Objects Web Services enhances support for your tactical and operation decision making, which improves Business process efficiency.

Sridurga Vannemreddi is an SAP Business Objects and Big Data developer with Bodhtree.  For more than a decade, Bodhtree has specialized in business analytics, leveraging close partnerships with leading BI software manufacturers such as SAP Business Objects, Informatica, and IBM Cognos.  Bodhtree offers free assessments to map analytics solutions to the goals and objectives specific to your organization.

Read More