Thursday, April 21, 2016

MapR Stream Workshop 2: auto.offset.reset

Theory:

Per Documentation for parameter auto.offset.reset:
   Specifies what MapR Streams should do when there is no initial offset, such as when a consumer starts reading from a partition.
  • earliest -- Reset the offset to the offset of the earliest message in the partition.
  • latest -- Reset the offset to the offset of the latest message in the partition.This is the default value.
  • none -- Throws a NoOffsetForPartitionException exception when the consumer next polls for messages in its subscription and no offset exists. The consumer must unsubscribe from the partition before polling functions correctly.
Any other value throws an error to the consumer.

Experiment:

1. Default value auto.offset.reset="latest"

With default code in SampleConsumer:
mapr openkb.stream.SampleConsumer
Consumer will start to fetch the latest messages in that topic:(Look at the "offset")
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 354989820, timestamp = 1461251182624, producer = root, key = null, value = Msg 0)
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 354989821, timestamp = 1461251182624, producer = root, key = null, value = Msg 1)
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 354989822, timestamp = 1461251182624, producer = root, key = null, value = Msg 5)
...
...
If there is no new messages coming in, then Consumer will wait there until the poll timeout is triggered.

 2. auto.offset.reset="earliest"

We just changed auto.offset.reset="earliest" in SampleConsumer code and then execute it:
mapr openkb.stream.SampleConsumer
Consumer will start to fetch the earliest messages in that topic:(Look at the "offset")
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 1, timestamp = 1460995818285, producer = root, key = null, value = Msg 0)
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 2, timestamp = 1460995818285, producer = root, key = null, value = Msg 4)
Consumed Record: ConsumerRecord(topic = /stream/s1:info, partition = 0, offset = 3, timestamp = 1460995818285, producer = root, key = null, value = Msg 1)
...
...

This is similar as a full table scan.


No comments:

Post a Comment

Popular Posts