Sunday, June 29, 2014

Impala architecture

1. Impala Daemon

The core Impala component is a daemon process that runs on each node of the cluster.
It reads and writes to data files; accepts queries transmitted from the impala-shell command, Hue, JDBC, or ODBC; parallelizes the queries and distributes work to other nodes in the Impala cluster; and transmits intermediate query results back to the central coordinator node.

2. Statestore

It checks on the health of Impala daemons on all the nodes in a cluster, and continuously relays its findings to each of those daemons. If an Impala node goes offline due to hardware failure, network error, software issue, or other reason, the statestore informs all the other nodes so that future queries can avoid making requests to the unreachable node.

3. Catalog Service

It relays the metadata changes from Impala SQL statements to all the nodes in a cluster.
This new component in Impala 1.2 reduces the need for the REFRESH and INVALIDATE METADATA statements.

No comments:

Post a Comment

Popular Posts