DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI ... IT6701 - INFORMATION ... Write short notes of following YARN NoSQL Hive...

30 downloads 757 Views 375KB Size
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Department of Information Technology

IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers

Year / Semester: IV / VII Regulation: 2013 Academic year: 2017 - 2018

UNIT I PART-A 1. What is a data model? List the types of data model used. A database model is the theoretical foundation of a database and fundamentally determines in whichmanner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system. The most popularexample of a database model is the relational model. Types of data model used      

Hierarchical model Network model Relational model Entity-relationship Object-relational model Object model

2. Define database management system? List some applications of DBMS. Database Management System (DBMS) is a collection of interrelated data and a set of programs to accessthose data.         

Banking Airlines Universities Credit card transactions Tele communication Finance Sales Manufacturing Human resources

3. Give the levels of data abstraction?   

Physical level Logical level View level

4. Define data model? A data model is a collection of conceptual tools for describing data, data relationships, data semantics andconsistency constraints.

5. What is an entity relationship model? The entity relationship model is a collection of basic objects called entities and relationship among thoseobjects. An entity is a thing or object in the real world that is distinguishable from other objects. 6. What are attributes and relationship? Give examples.     

An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. Example: possible attributes of customer entity are customer name, customer id, Customer Street,customer city. A relationship is an association among several entities. Example: A depositor relationship associates a customer with each account that he/she has.

7. Define single valued and multivalued attributes.  

Single valued Attributes: attributes with a single value for a particular entity are called singlevalued attributes. Multivalued Attributes: Attributes with a set of value for a particular entity are called multivaluedattributes.

8. What is meant by normalization of data? It is a process of analyzing the given relation schemas based on their Functional Dependencies (FDs) andprimary key to achieve the properties   

Minimizing redundancy Minimizing insertion Deletion and updating anomalies

9. Define - Entity set and Relationship set.  

Entity set: The set of all entities of the same type is termed as an entity set. Relationship set: The set of all relationships of the same type is termed as a relationship set.

10. What are stored, derived, composite attributes? 

Stored attributes: The attributes stored in a data base are called stored attributes.

 

Derived attributes: The attributes that are derived from the stored attributes are called derivedattributes. For example: The Age attribute derived from DOB attribute.

11. Define - null values. In some cases a particular entity may not have an applicable value for an attribute or if we do not know thevalue of an attribute for a particular entity. In these cases null value is used. 12. What is meant by the degree of relationship set? The degree of relationship type is the number of participating entity types.

13. Define - Weak and Strong Entity Sets  

Weak entity set: entity set that do not have key attribute of their own are called weak entity sets. Strong entity set: Entity set that has a primary key is termed a strong entity set.

14. What does the cardinality ratio specify?  

Mapping cardinalities or cardinality ratios express the number of entities to which another entity canbe associated. Mapping cardinalities must be one of the following: • One to one • One to many • Many to one • Many to many

15. What are the two types of participation constraint?  

Total: The participation of an entity set E in a relationship set R is said to be total if every entity in Eparticipates in at least one relationship in R. Partial: if only some entities in E participate in relationships in R, the participation of entity set E inrelationship R is said to be partial.

16. What is a candidate key and primary key?  

Minimal super keys are called candidate keys. Primary key is chosen by the database designer as the principal means of identifying an entity in theentity set.

17. Define -Business Rules.  

Business rules are an excellent tools to document the various aspects of business domain. For example: A student is evaluated for a course through combination of theory and practicalexaminations.

18. What is JDBC? List of JDBC drivers. Java Database Connectivity (JDBC) is an application programming interface (API) for the programminglanguage Java, which defines how a client may access a database. It is part of the Java Standard Editionplatform, from Oracle Corporation.    

Type 1 - JDBC-ODBC Bridge Driver. Type 2 - Java Native Driver. Type 3 - Java Network Protocol Driver. Type 4 - Pure Java Driver.

19. What are the steps involved to access the database using JDBC?     

Register the JDBC Driver Creating database connection Executing queries Processing the results Closing the database connection.

20. What are three classes of statements using to execute queries in java?   

Statement Prepared Statement Callable Statement

21. What is stored procedure? 

In a database management system (DBMS), a stored procedure is a set of Structured Query Language (SQL) statements with an assigned name that's stored in the database in compiled form so that it can beshared by a number of programs.



The use of stored procedures can be helpful in controlling access to data, preserving data integrity and improving productivity.

22. What do the four V’s of Big Data denote? IBM has a nice, simple explanation for the four critical features of big data:   

Volume – Scale of data Velocity – Analysis of streaming data Variety – Different forms of data

23. List out the companies that use Hadoop.       

Yahoo (One of the biggest user & more than 80% code contributor to Hadoop)\ Facebook Netflix Amazon Adobe eBay Twitter

24. Distinguish between Structured and Unstructured data. 

  

Data which can be stored in traditional database systems in the form of rows and columns, forexample the online purchase transactions can be referred to as Structured Data. Data which can be stored only partially in traditional database systems, for example, data in XMLrecords can be referred to as Semi Structured Data. Unorganized and raw data that cannot be categorized as semi structured or structured data isreferred to as unstructured data. Facebook updates, Tweets on Twitter, Reviews, web logs, etc. are all examples of unstructured data.

25. What concept the Hadoop framework works? Hadoop Framework works on the following two core components



HDFS – Hadoop Distributed File System: It is the java based file system for scalable and reliablestorage of large datasets. Data in HDFS is stored in the form of blocks and it operates on the MasterSlave Architecture. HadoopMapReduce: This is a java based programming paradigm of Hadoop framework thatprovides scalability across various Hadoop clusters.

26. What are the main components of a Hadoop Application? Hadoop applications have wide range of technologies that provide great advantage in solving complexbusiness problems. Core components of a Hadoop application are         

Hadoop Common HDFS HadoopMapReduce YARN Data Access Components are - Pig and Hive Data Storage Component is – Hbase Data Integration Components are - Apache Flume, Sqoop. Data Management and Monitoring Components are - Ambari, Oozie and Zookeeper. Data Serialization Components are - Thrift and Avr Data Intelligence Components are - Apache Mahout and Drill.

27. Whenever a client submits a hadoop job, who receives it?  

NameNode receives the Hadoop job which then looks for the data requested by the client andprovides the block information. JobTracker takes care of resource allocation of the hadoop job to ensure timely completion.

28. What is partitioning, shuffle and sort phase. Shuffle Phase: Once the first map tasks are completed, the nodes continue to perform several other map tasksand also exchange the intermediate outputs with the reducers as required. This process of moving theintermediate outputs of map tasks to the reducer is referred to as Shuffling. Sort Phase: HadoopMapReduce automatically sorts the set of intermediate keys on a single node before theyare given as input to the reducer. Partitioning Phase: The process that determines which intermediate keys and value will be received by eachreducer instance is referred to as partitioning. The destination partition is same for any key irrespective of themapper instance that generated it. 29. Distinguish between HBase and Hive.  

HBase and Hive both are completely different hadoop based technologiesHive is a data warehouse infrastructure on top of Hadoop, whereas HBase is a NoSQL key value storethat runs on top of Hadoop.

 

Hive helps SQL savvy people to run MapReduce jobs whereas HBase supports 4 primary operationsput,get, scan and delete. HBase is ideal for real time querying of big data where Hive is an ideal choice for analytical querying ofdata collected over period of time.

30. Distinguish between Hadoop 1.x and Hadoop 2.x 

 

In Hadoop 1.x, MapReduce is responsible for both processing and cluster management whereas inHadoop 2.x processing is taken care of by other processing models and YARN is responsible for clustermanagement. Hadoop 2.x scales better when compared to Hadoop 1.x with close to 10000 nodes per cluster. Hadoop 1.x has single point of failure problem and whenever the NameNode fails it has to be recoveredmanually. However, in case of Hadoop 2.x StandByNameNode overcomes the problem and wheneverthe NameNode fails it is configured for automatic recovery.

PART B 1. Describe about Database Design and Database Modelling. 2. Explain detail about normalization with suitable examples. 3. Explain about JDBC Drivers, and how to access their database? 4. Explain Hadoop Eco systems. 5. Write short notes of following   

YARN NoSQL Hive