Update DEVELOPMENT_PLAN.md with internal API findings
This commit is contained in:
@@ -16,6 +16,35 @@ To identify and, if feasible, implement a sustainable, quota-friendly, compliant
|
||||
4. **Tooling:** Consider using tools like `mitmproxy` for more in-depth network traffic interception and analysis on a Linux system.
|
||||
* **Expected Outcome:** A detailed understanding of YouTube's internal live chat data acquisition methods.
|
||||
|
||||
### Findings: Internal `POST /youtubei/v1/live_chat/get_live_chat` API
|
||||
|
||||
Our network analysis revealed that YouTube's web client uses a `POST` request to `https://www.youtube.com/youtubei/v1/live_chat/get_live_chat` for receiving live chat messages. This is a continuation-based polling mechanism, not a gRPC stream or a simple public API call.
|
||||
|
||||
**Request Payload Key Elements:**
|
||||
* `context`: Contains extensive client-specific information (e.g., `userAgent`, `clientName`, `clientVersion`, `visitorData`, `timeZone`, `browserName`, `browserVersion`). Mimicking this accurately is crucial.
|
||||
* `continuation`: A string token that dictates which set of messages to fetch next. This token is received in the previous response.
|
||||
|
||||
**Response Payload Key Elements:**
|
||||
* `continuationContents.liveChatContinuation.continuations`: Contains an `invalidationContinuationData` object with the *new* `continuation` token for the subsequent request.
|
||||
* `actions`: An array containing `addChatItemAction` objects, each representing a chat message.
|
||||
* `addChatItemAction.item.liveChatTextMessageRenderer`: Contains the message details.
|
||||
* `message.runs[0].text`: The actual chat message content.
|
||||
* `authorName.simpleText`: The author's display name.
|
||||
* `authorPhoto.thumbnails`: URLs for the author's profile picture.
|
||||
* `timestampUsec`: Timestamp of the message.
|
||||
* `authorExternalChannelId`: The author's channel ID.
|
||||
|
||||
**Implications:**
|
||||
* **Undocumented API:** This is an internal, undocumented API endpoint, subject to change without notice. This introduces significant fragility and maintenance risk.
|
||||
* **Compliance Risk:** Using undocumented internal APIs is generally against YouTube's Terms of Service.
|
||||
* **Authentication Challenge:** The requests include `Authorization` (e.g., `SAPISIDHASH`) and `Cookie` headers, indicating that mimicking this client will require managing a logged-in YouTube session, which is complex and prone to breaking.
|
||||
|
||||
### Next Research Steps for Phase 1:
|
||||
|
||||
1. **Authentication Research:** Investigate methods to obtain and maintain valid `Authorization` and `Cookie` headers programmatically for a long-running application. This is a major hurdle for internal APIs.
|
||||
2. **Mimicking Client Context:** Determine which parts of the `context` object are essential and how to generate/maintain them to avoid being blocked.
|
||||
3. **Python Implementation Strategy:** Outline how to construct these `POST` requests, parse the JSON responses, and extract chat messages and the next `continuation` token.
|
||||
|
||||
## Phase 2: Re-exploration of YouTube Data API v3 (Creative Use)
|
||||
|
||||
* **Objective:** Explore creative, highly optimized uses of the existing REST API that might offer better sustainability, even if not truly event-driven.
|
||||
|
||||
Reference in New Issue
Block a user