The object is to obtain cards the sum of whose numerical values is as great as possible without exceeding 21. All face cards count as 10, and the ace can count either as 1 or 11. We consider the version in which each player competes independently against the dealer.
The game begins with two cards dealt to both dealer and player. One of the dealer's cards is faceup and the other is facedown. If the player has 21 immediately (an ace and a ten-card), it is called a natural. He then wins unless the dealer also has a natural, in which case the game is a draw.
If the player does not have a natural, then he can request additional cards, one by one (hits), until he either stops (sticks) or exceeds 21 (goes bust). If he goes bust, he loses; if he sticks, then it becomes the dealer's turn.
The dealer hits or sticks according to a fixed strategy without choice: he sticks on any sum of 17 or greater, and hits otherwise. If the dealer goes bust, then the player wins; otherwise, the outcome--win, lose, or draw--is determined by whose final sum is closer to 21.
We assume that cards are dealt from an infinite deck (i.e., with replacement), so that there is no advantage to keeping track of the cards already dealt.
If the player holds an ace that he count count as a 11 without going bust, then the ace is said to be usable. In this case, it is always counted as 11.
(From Sutton and Barto's reinforcement learning book.)