Role of Spark or Hive MetastoreΒΆ

Let us understand the role of Spark Metastore or Hive Metasore. We need to first understand details related to Metadata generated for Spark Metastore tables.

  • When we create a Spark Metastore table, there is metadata associated with it.

    • Table Name

    • Column Names and Data Types

    • Location

    • File Format

    • and more

  • This metadata has to be stored some where so that Query Engines such as Spark SQL can access the information to serve our queries.

Let us understand where the metadata is stored.

  • Information is typically stored in relational database and it is called as metastore.

  • It is extensively used by Hive or Spark SQL engine for syntax and semantics check as well as execution of queries.

  • In our case it is stored in MySQL Database. Let us review the details by going through relevant properties.