<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Web3 Technical Topics Explained]]></title><description><![CDATA[I break down web3 technical topics using analogies you can relate to and help you digest with much ease...]]></description><link>https://thedataengineerblog.com</link><generator>RSS for Node</generator><lastBuildDate>Mon, 13 Apr 2026 10:57:46 GMT</lastBuildDate><atom:link href="https://thedataengineerblog.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[How to Install and Set the Latest Python Version as Default on macOS Sequoia and Sonoma]]></title><description><![CDATA[If you're a Mac user struggling to install the latest Python version and make it your system's default, here is an easy to follow solution for you. 
Managing multiple Python versions on macOS can be a bit tricky, especially when you need the latest f...]]></description><link>https://thedataengineerblog.com/how-to-install-and-set-the-latest-python-version-as-default-on-macos-sequoia-and-sonoma</link><guid isPermaLink="true">https://thedataengineerblog.com/how-to-install-and-set-the-latest-python-version-as-default-on-macos-sequoia-and-sonoma</guid><category><![CDATA[Python]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[installation guide]]></category><category><![CDATA[pyenv]]></category><category><![CDATA[sonoma]]></category><category><![CDATA[Apple]]></category><category><![CDATA[python beginner]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Thu, 19 Sep 2024 19:55:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1726775069683/0dbd9c0d-9d3b-41b1-850f-1b112b28f799.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you're a Mac user struggling to install the latest Python version and make it your system's default, here is an easy to follow solution for you. </p>
<p>Managing multiple Python versions on macOS can be a bit tricky, especially when you need the latest features or compatibility with new libraries. This tutorial will guide you through the process step-by-step, ensuring you have the latest Python version up and running as your default interpreter.</p>
<h2 id="heading-why-update-python-on-your-mac">Why Update Python on Your Mac?</h2>
<p>macOS comes with a pre-installed version of Python, but it's often outdated and not suitable for development purposes. Installing the latest Python version allows you to:</p>
<ul>
<li>Access new language features.</li>
<li>Ensure compatibility with the latest libraries and frameworks.</li>
<li>Improve performance and security.</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><strong>macOS Sonoma or later</strong>: While these steps should work on earlier versions, they are tailored for macOS Sonoma.</li>
<li><strong>Homebrew Installed</strong>: Homebrew is a package manager for macOS that simplifies the installation of software.</li>
</ul>
<h2 id="heading-step-by-step-guide">Step-by-Step Guide</h2>
<h4 id="heading-1-check-your-current-python-version">1. Check Your Current Python Version</h4>
<p>Before making any changes, it's good to know which Python version you're currently using.</p>
<pre><code class="lang-bash">python3 --version
</code></pre>
<p><strong>Example Output</strong>:</p>
<pre><code class="lang-bash">Python 3.9.6
</code></pre>
<h4 id="heading-2-install-homebrew-if-not-already-installed">2. Install Homebrew (If not already installed)</h4>
<p>If you don't have Homebrew installed, you can install it using the following command:</p>
<pre><code class="lang-bash">/bin/bash -c <span class="hljs-string">"<span class="hljs-subst">$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)</span>"</span>
</code></pre>
<p>For more detailed instructions, visit the <a target="_blank" href="https://brew.sh">Homebrew website</a>.</p>
<h4 id="heading-3-update-homebrew">3. Update Homebrew</h4>
<p>Before installing new packages, ensure Homebrew is up to date:</p>
<pre><code class="lang-bash">brew update
</code></pre>
<h4 id="heading-4-install-the-latest-python-version">4. Install the Latest Python Version</h4>
<p>As of writing this tutorial, the latest stable Python version is <strong>Python 3.12</strong>. Install it using Homebrew:</p>
<pre><code class="lang-bash">brew install python@3.12
</code></pre>
<p>This command downloads and installs Python 3.12 and its associated tools.</p>
<h4 id="heading-5-adjust-your-path-environment-variable">5. Adjust Your PATH Environment Variable</h4>
<p>To make sure your system uses the newly installed Python version by default, you need to adjust your <code>PATH</code> environment variable.</p>
<p><strong>Edit Your Shell Configuration File</strong>
Depending on the shell you're using (likely <code>zsh</code> or <code>bash</code>), you'll need to edit either <code>~/.zshrc</code> or <code>~/.bash_profile</code>.</p>
<p><strong>For Zsh (default on macOS Catalina and later)</strong>:</p>
<pre><code class="lang-bash">nano ~/.zshrc
</code></pre>
<p><strong>For Bash</strong>:</p>
<pre><code class="lang-bash">nano ~/.bash_profile
</code></pre>
<p><strong>Add Homebrew's Python to Your PATH</strong>
Add the following line at the top of the file:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> PATH=<span class="hljs-string">"/opt/homebrew/bin:<span class="hljs-variable">$PATH</span>"</span>
</code></pre>
<p><strong>Note</strong>: On Intel-based Macs, use <code>/usr/local/bin</code> instead of <code>/opt/homebrew/bin</code>.</p>
<p><strong>Save and Exit</strong></p>
<ul>
<li>In Nano editor, press <code>CTRL + X</code>, then <code>Y</code>, and hit <code>Enter</code> to save the changes.</li>
</ul>
<p><strong>Reload Your Shell Configuration</strong>
Apply the changes immediately by running:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">source</span> ~/.zshrc
</code></pre>
<p>Or for Bash:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">source</span> ~/.bash_profile
</code></pre>
<h4 id="heading-6-verify-the-installation">6. Verify the Installation</h4>
<p>Check that <code>python3</code> now points to the latest version:</p>
<pre><code class="lang-bash">python3 --version
</code></pre>
<p><strong>Expected Output</strong>:</p>
<pre><code class="lang-bash">Python 3.12.0
</code></pre>
<h4 id="heading-7-check-the-python-executable-path">7. Check the Python Executable Path</h4>
<p>Ensure that the <code>python3</code> command is pointing to the correct executable:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">which</span> python3
</code></pre>
<p>Expected Output:</p>
<pre><code class="lang-bash">/opt/homebrew/bin/python3
</code></pre>
<h4 id="heading-8-handling-multiple-python-versions">8. Handling Multiple Python Versions</h4>
<p>If you have multiple Python versions installed, you might want to manage them without uninstalling older versions.</p>
<p><strong>Use Python Version Managers</strong>
Consider using a Python version manager like <code>pyenv</code> to switch between different Python versions easily.</p>
<p><strong>Install <code>pyenv</code> via Homebrew</strong>:</p>
<pre><code class="lang-bash">brew install pyenv
</code></pre>
<p><strong>Install Python 3.12 Using <code>pyenv</code></strong>:</p>
<pre><code class="lang-bash">pyenv install 3.12.0
</code></pre>
<p><strong>Set Global Python Version</strong>:</p>
<pre><code class="lang-bash">pyenv global 3.12.0
</code></pre>
<h4 id="heading-9-optional-uninstall-older-python-versions">9. Optional: Uninstall Older Python Versions</h4>
<p>If you're certain you no longer need the older Python version, you can uninstall it to free up space.</p>
<p><strong>Uninstall via Homebrew</strong>:</p>
<pre><code class="lang-bash">brew uninstall python@3.9
</code></pre>
<p><strong>Note</strong>: If the older Python version wasn't installed via Homebrew, you'll need to remove it manually, which requires caution to avoid deleting system-critical files.</p>
<h4 id="heading-10-troubleshooting-tips">10. Troubleshooting Tips</h4>
<ul>
<li>Command Not Found Errors: If you encounter errors like command not found, ensure that you've correctly updated your PATH and reloaded your shell configuration.</li>
<li>Permission Issues: Avoid using sudo with brew commands, as Homebrew manages permissions internally.</li>
<li>Conflicting Python Versions: Use which -a python3 to list all python3 executables in your PATH and identify any conflicts.</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>By following these steps, you should have the latest Python version installed on your Mac and set as the default interpreter. This setup will allow you to take advantage of the newest Python features and ensure compatibility with up-to-date libraries.</p>
<h2 id="heading-additional-resources">Additional Resources</h2>
<ul>
<li>Homebrew Documentation: https://docs.brew.sh/</li>
<li>Python Downloads: https://www.python.org/downloads/</li>
<li>pyenv GitHub Repository: https://github.com/pyenv/pyenv</li>
</ul>
<p><em>Feel free to share your thoughts or ask questions in the comments below!</em></p>
<p><em>Happy coding!</em></p>
]]></content:encoded></item><item><title><![CDATA[What is an API?]]></title><description><![CDATA[What are APIs and why do they matter?
Imagine you’re ordering food at a restaurant. You look at the menu, pick your dishes, and tell the waiter your order. The waiter then goes to the kitchen, puts in your order, and brings back the prepared dishes.
...]]></description><link>https://thedataengineerblog.com/what-is-an-api-0743f6189cbb</link><guid isPermaLink="true">https://thedataengineerblog.com/what-is-an-api-0743f6189cbb</guid><category><![CDATA[APIs]]></category><category><![CDATA[software development]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data]]></category><category><![CDATA[REST API]]></category><category><![CDATA[GraphQL]]></category><category><![CDATA[Hashnode]]></category><category><![CDATA[technology]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Fri, 29 Sep 2023 17:42:29 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/uIE9f6gV8rI/upload/701c54ad955f448876cb92ea1b0c873c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-what-are-apis-and-why-do-they-matter">What are APIs and why do they matter?</h3>
<p>Imagine you’re ordering food at a restaurant. You look at the menu, pick your dishes, and tell the waiter your order. The waiter then goes to the kitchen, puts in your order, and brings back the prepared dishes.</p>
<p>The waiter serves as an interface between you and the kitchen. You don’t need to know how the kitchen prepares the food — you simply rely on the waiter to communicate your order and deliver the results.</p>
<p>This is similar to how APIs work in software. The API serves as an interface between two applications. One app makes a request to the API (the order), without needing to know how it’s implemented on the server side. The API then returns the response (the dishes).</p>
<p>For example, a weather app shows forecasts without knowing how weather data is aggregated and analyzed on the back-end. It simply calls the weather API to get the needed results.</p>
<p>So in essence, APIs act as intermediaries that handle communication and data exchange between different applications, abstracting away underlying complexities.</p>
<h3 id="heading-a-brief-history-of-api-development">A brief history of API development</h3>
<p>APIs have evolved significantly from their early beginnings to the integral role they play today.</p>
<p>In the 1960s and 1970s, APIs were mostly internal, providing connectivity between mainframe systems and custom software within organizations. For example, <a target="_blank" href="https://en.wikipedia.org/wiki/Sabre_%28travel_reservation_system%29#:~:text=Sabre%20Global%20Distribution%20System%2C%20owned%20by%20Sabre,independent%20of%20the%20airline%20in%20March%202000."><strong>American Airlines</strong></a> developed an API in the 1960s for its Sabre airline reservation system.</p>
<p>The advent of service-oriented architecture in the 1990s led to the growth of web APIs, also called web services. Companies like eBay (1997), Amazon (2002), PayPal (2000), and Salesforce (2000) offered public APIs for payments, shopping carts, and infrastructure services.</p>
<p>The launch of the Google Maps API in 2005 and the Twitter API in 2006 accelerated the opening of public web APIs. They enabled new mashups and applications built on established platforms. Facebook, YouTube, and others followed this open API model.</p>
<p>The mobile app explosion in the late 2000s further drove API adoption. Uber (2009), Airbnb (2008), and Instagram (2010) relied on APIs to connect mobile apps to back-end services. The client-server separation became a standard pattern.</p>
<p>Today, API-first companies like Twilio (2008), Stripe (2010), and Plaid (2013) are disrupting industries by offering core functionality through APIs rather than traditional applications. The API economy continues to thrive.</p>
<h3 id="heading-key-api-concepts-and-architecture">Key API concepts and architecture</h3>
<p>While APIs come in many forms, there are some common architectural concepts and components that apply to many web-based APIs.</p>
<p>At a high level, APIs allow client applications to access data or functionality from a server via API calls. The client makes requests to the API’s endpoints (URLs) and receives back responses.</p>
<p>For web APIs, requests and responses are typically sent over HTTP or HTTPS. APIs use HTTP request methods like <strong><em>GET</em></strong>, <strong><em>POST</em></strong>, <strong><em>PUT</em></strong>, and <strong><em>DELETE</em></strong> to perform operations. Response codes like <strong><em>200</em> OK</strong>, <strong><em>400</em> Bad Request</strong>, and <strong><em>500</em> Server Error</strong> indicate the request status.</p>
<p>Most modern web APIs return data in lightweight JSON format rather than XML. Developer documentation and SDKs make it easier to integrate with APIs.</p>
<p>Other common API architecture components include:</p>
<ul>
<li><p>Authentication like API keys to identify applications</p>
</li>
<li><p>Rate limiting to prevent abuse</p>
</li>
<li><p>Versioning to evolve APIs without breaking changes</p>
</li>
<li><p>Caching to improve performance</p>
</li>
<li><p>Status monitoring to track uptime</p>
</li>
</ul>
<p>These architectural concepts power the seamless exchange of data between applications through modern web APIs.</p>
<h3 id="heading-benefits-of-apis">Benefits of APIs</h3>
<p>There are many benefits to building software and connecting systems using APIs:</p>
<ol>
<li><p><strong>Modularity</strong> — APIs allow code to be separated into reusable modules with clearly defined interfaces for communication. This breaks down system complexity.</p>
</li>
<li><p><strong>Developer Experience</strong> — Well-designed APIs improve developer experience by being easy to integrate with. API documentation and SDKs make APIs more accessible.</p>
</li>
<li><p><strong>Scalability</strong> — APIs enable systems to scale by separating front-end and back-end components across servers. Back-ends can be expanded as needed.</p>
</li>
<li><p><strong>Code Reuse</strong> — APIs allow code to be reused across multiple platforms. For example, a payment API can be integrated across web, mobile, etc.</p>
</li>
<li><p><strong>Innovation</strong> — Public APIs enable new products and services by allowing developers to tap into functionality. The API economy thrives on this ecosystem.</p>
</li>
</ol>
<p>There are also business benefits like additional revenue streams from API monetization and faster time-to-market by building on existing APIs. Overall, APIs done well provide a multitude of architecture and business benefits.</p>
<h3 id="heading-introduction-to-rest-and-graphql">Introduction to REST and GraphQL</h3>
<p>There are two leading architectural approaches for designing web APIs — REST and GraphQL.</p>
<p><strong>REST(Representational State Transfer)</strong> is one of the most prevalent API architectures. REST APIs rely on standard HTTP methods and status codes to access and manipulate textual data representations.</p>
<p>In a REST API, clients make requests to predefined endpoints at URLs representing individual resources. REST uses HTTP features like caching, content negotiation, and hypermedia controls.</p>
<p><strong>GraphQL</strong> is a newer API architecture that was developed to address shortcomings of REST. Instead of accessing pre-set endpoints, GraphQL APIs allow clients to declare and retrieve structured data through a single endpoint using a query language.</p>
<p>In GraphQL, the client requests define the structure of the response rather than the server. This allows retrieving data in flexible shapes optimized for the client. GraphQL also uses a strongly typed schema system.</p>
<p>While REST is better suited for simple use cases, GraphQL improves efficiency for accessing complex nested data and enabling frequent client-driven changes. We’ll dive deeper into the technical comparison in upcoming posts.</p>
<p>This introduces the high-level concepts of REST and GraphQL. Both play major roles in modern API architecture and have their merits depending on your use case.</p>
<h3 id="heading-summary">Summary</h3>
<p>APIs have become a critical part of enabling functionality, data access, and connectivity in the software landscape. As usage continues to grow, API architecture has evolved from early internal origins to the wide array of public APIs fueling innovative applications we see today.</p>
<p>Core API architecture concepts like endpoints, HTTP methods, status codes, and authentication power the seamless data exchanges between clients and servers. Benefits like modularity, code reuse, and scalability have made APIs integral to modern software design.</p>
<p>REST and GraphQL have emerged as two leading architectural styles for crafting APIs optimized for different use cases. REST’s simplicity and wide adoption makes it a solid default choice, while GraphQL offers efficiency gains for complex data structures.</p>
]]></content:encoded></item><item><title><![CDATA[Build a Streamr Node Dashboard with Streamlit using Python]]></title><description><![CDATA[Let’s say this is the first time you’ve heard or read about Streamr Network and Streamlit; if you are already aware, then bear with me as I give a TLDR:

Streamr Network: is a decentralized, peer-to-peer network for real-time data publishing and subs...]]></description><link>https://thedataengineerblog.com/build-a-streamr-node-dashboard-with-streamlit-using-python-bbf0b52a29cb</link><guid isPermaLink="true">https://thedataengineerblog.com/build-a-streamr-node-dashboard-with-streamlit-using-python-bbf0b52a29cb</guid><category><![CDATA[streamr]]></category><category><![CDATA[streamlit]]></category><category><![CDATA[dashboard]]></category><category><![CDATA[Python]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data]]></category><category><![CDATA[realtime]]></category><category><![CDATA[software development]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Thu, 22 Jun 2023 21:17:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1708532313970/55e13569-01ae-4a3e-8702-63dcd8dc1694.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let’s say this is the first time you’ve heard or read about Streamr Network and Streamlit; if you are already aware, then bear with me as I give a TLDR:</p>
<ol>
<li><p><a target="_blank" href="https://streamr.network/?ref=thedataengineerblog.com"><strong>Streamr Network</strong></a><strong>:</strong> is a decentralized, peer-to-peer network for real-time data publishing and subscription, providing a scalable, robust, and secure platform for data exchange without reliance on a central server.</p>
</li>
<li><p><a target="_blank" href="https://streamlit.io/?ref=thedataengineerblog.com"><strong>Streamlit</strong></a><strong>:</strong> is an open-source Python library for rapidly creating and deploying interactive web apps for data science and machine learning without needing web development skills.</p>
</li>
</ol>
<p>Streamr Node Dashboard is an application built using Streamlit inspired by <a target="_blank" href="https://brubeckscan.app/?ref=thedataengineerblog.com">BrubeckScan</a> (R.I.P), a Streamr node and rewards monitoring dApp built and maintained by Streamr community member <a target="_blank" href="https://www.adamvo.dev/?ref=thedataengineerblog.com">Adam Phi Vo</a>. The application is built around the concept of a Streamr Node, an entity in the network that processes and stores data. We will go through the process of building this application step-by-step.</p>
<p>The main features of the application are:</p>
<ul>
<li><p>Fetching data from the Streamr API endpoints</p>
</li>
<li><p>Displaying details about a specific Streamr node</p>
</li>
<li><p>Displaying payouts and the latest claimed reward codes for a Streamr node</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708465293096/7e9d385f-f276-4643-aac6-44a71bf1703d.gif" alt class="image--center mx-auto" /></p>
<h3 id="heading-dependencies">Dependencies</h3>
<p>The dependencies for this project are fairly standard for a Python data app:</p>
<ul>
<li><p><code>concurrent.futures</code> - for running multiple requests concurrently</p>
</li>
<li><p><code>io</code> - for handling byte streams</p>
</li>
<li><p><code>logging</code> - for logging messages</p>
</li>
<li><p><code>math</code> and <code>re</code> - for numerical and regular expression operations, respectively</p>
</li>
<li><p><code>datetime</code> - for handling datetime objects</p>
</li>
<li><p><code>pytz</code> - for handling timezone conversions</p>
</li>
<li><p><code>requests</code> - for making HTTP requests</p>
</li>
<li><p><code>streamlit</code> - for the web app framework</p>
</li>
<li><p><code>PIL</code> (Pillow) and <code>reportlab</code>, <code>svglib</code> - for handling images and SVGs</p>
</li>
<li><p><code>config</code> - is a custom module containing configuration parameters (like API base URLs)</p>
</li>
<li><pre><code class="lang-python">  <span class="hljs-keyword">import</span> logging
  <span class="hljs-keyword">import</span> math
  <span class="hljs-keyword">import</span> re
  <span class="hljs-keyword">import</span> pytz
  <span class="hljs-keyword">import</span> requests
  <span class="hljs-keyword">import</span> config
  <span class="hljs-keyword">import</span> streamlit <span class="hljs-keyword">as</span> st
  <span class="hljs-keyword">import</span> concurrent.futuresimport io

  <span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
  <span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional
  <span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
  <span class="hljs-keyword">from</span> reportlab.graphics <span class="hljs-keyword">import</span> renderPM
  <span class="hljs-keyword">from</span> svglib.svglib <span class="hljs-keyword">import</span> svg2rlg
</code></pre>
</li>
</ul>
<h3 id="heading-streamlit-app-configuration">Streamlit App Configuration</h3>
<p>Streamlit’s <code>set_page_config</code> function is used to customize the app's page settings, including the title, icon, layout, initial sidebar state, and menu items.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Streamlit page config MUST be the first Streamlit command</span>
<span class="hljs-comment"># used in your app, and MUST only be set once</span>
st.set_page_config(
    page_title=<span class="hljs-string">"Streamr BrubeckScan Dashboard App"</span>,
    page_icon=<span class="hljs-string">":lightning:"</span>,
    layout=<span class="hljs-string">"wide"</span>,
    initial_sidebar_state=<span class="hljs-string">"expanded"</span>,
    menu_items={
        <span class="hljs-string">'Get help'</span>: <span class="hljs-string">'https://www.thedataengineerblog.com/'</span>,
        <span class="hljs-string">'About'</span>: <span class="hljs-string">"# This is a Streamlit clone version of the official Streamr BrubeckScan dashboard."</span>}
)

<span class="hljs-comment"># Set up logging</span>
logging.basicConfig(filename=<span class="hljs-string">'app.log'</span>, 
                    level=logging.INFO,
                    format=<span class="hljs-string">'%(asctime)s - %(levelname)s - %(message)s'</span>
)
</code></pre>
<h3 id="heading-fetching-data">Fetching Data</h3>
<p>The <code>fetch_data()</code> function fetches data from a given endpoint and handles any errors that might occur:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_data</span>(<span class="hljs-params">endpoint: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch data from a given endpoint.

    Args:
    endpoint: The URL of the endpoint to fetch data from.

    Returns:
    The JSON response from the endpoint as a dictionary.
    Returns None if the request fails.
    """</span>
    <span class="hljs-keyword">try</span>:
        response = requests.get(endpoint)
        response.raise_for_status()
        <span class="hljs-keyword">return</span> response.json()
    <span class="hljs-keyword">except</span> requests.exceptions.RequestException <span class="hljs-keyword">as</span> e:
        logging.error(<span class="hljs-string">f"Request to <span class="hljs-subst">{endpoint}</span> failed: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
</code></pre>
<p>The <code>fetch_node_data</code> function is a specialized function for fetching data about a specific Streamr node from the Streamr API:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_node_data</span>(<span class="hljs-params">node_address: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch data for a specific Streamr node.

    Args:
    node_address: The Ethereum address of the Streamr node.

    Returns:
    The data for the Streamr node as a dictionary.
    Returns None if the request fails.
    """</span>
    logging.info(<span class="hljs-string">f"Fetching node data for address <span class="hljs-subst">{node_address}</span>"</span>)
    <span class="hljs-keyword">return</span> fetch_data(<span class="hljs-string">f"<span class="hljs-subst">{config.API_BASE}</span>/nodes/<span class="hljs-subst">{node_address}</span>"</span>)
</code></pre>
<p>The <code>get_metrics_data</code> function fetches metrics data for a specific Streamr node. It uses a ThreadPoolExecutor from the <code>concurrent.futures</code> module to fetch data from multiple endpoints concurrently; this is overkill, but why not? 😂:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_metrics_data</span>(<span class="hljs-params">node_address: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch metrics data for a specific Streamr node.

    Args:
    node_address: The Ethereum address of the Streamr node.

    Returns:
    The metrics data for the Streamr node as a dictionary.
    Returns None if any of the requests fail.
    """</span>
    logging.info(<span class="hljs-string">f"Getting metrics data for node <span class="hljs-subst">{node_address}</span>"</span>)
    data = {
        <span class="hljs-string">"acc_rewards"</span>: <span class="hljs-string">f"<span class="hljs-subst">{config.DATA_REWARDS_BASE}</span>/<span class="hljs-subst">{node_address}</span>"</span>,
        <span class="hljs-string">"claimed_rewards"</span>: <span class="hljs-string">f"<span class="hljs-subst">{config.CLAIMED_REWARDS_BASE}</span>/<span class="hljs-subst">{node_address}</span>"</span>,
        <span class="hljs-string">"apr_apy"</span>: config.APR_APY_BASE
    }

    <span class="hljs-keyword">with</span> concurrent.futures.ThreadPoolExecutor() <span class="hljs-keyword">as</span> executor:
        future_to_url = {executor.submit(
            fetch_data, url): key <span class="hljs-keyword">for</span> key, url <span class="hljs-keyword">in</span> data.items()}
        results = {future_to_url[future]: future.result(
        ) <span class="hljs-keyword">for</span> future <span class="hljs-keyword">in</span> concurrent.futures.as_completed(future_to_url)}

    <span class="hljs-comment"># Exclude any endpoints that failed to respond</span>
    <span class="hljs-keyword">return</span> {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> results.items() <span class="hljs-keyword">if</span> v <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>}
</code></pre>
<h3 id="heading-data-transformation">Data Transformation</h3>
<p>The functions below are handy for displaying times in a user’s local timezone. The goal is to make it easier for users to understand when the node events occurred in their timezone. We will use the functions to build the dashboard display later.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert_time_to_user_tz</span>(<span class="hljs-params">time_str: str, user_tz: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Convert a time string to a given timezone and format it.

    Args:
    time_str: The time string to convert. It should be in ISO 8601 format (i.e., "YYYY-MM-DDTHH:MM:SS.sssZ").
    user_tz: The timezone to convert the time to.

    Returns:
    The time converted to the user's timezone and formatted as a string.
    """</span>
    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    user_tz = pytz.timezone(user_tz)

    <span class="hljs-comment"># Convert the string to a datetime object</span>
    dt = datetime.strptime(time_str, <span class="hljs-string">"%Y-%m-%dT%H:%M:%S.%fZ"</span>)

    <span class="hljs-comment"># Set the timezone to UTC (since the original time is in UTC)</span>
    dt = utc.localize(dt)

    <span class="hljs-comment"># Convert to user selected timezone</span>
    dt_user_tz = dt.astimezone(user_tz)

    <span class="hljs-comment"># Format the time in the desired way (12-hour time)</span>
    formatted_time = dt_user_tz.strftime(<span class="hljs-string">"%I:%M:%S %p"</span>)

    <span class="hljs-keyword">return</span> formatted_time

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert_dt_to_user_tz</span>(<span class="hljs-params">dt: datetime, user_tz: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Convert a datetime object to a given timezone and format it.

    Args:
    dt: The datetime object to convert. It should be naive (i.e., timezone-unaware).
    user_tz: The timezone to convert the datetime to.

    Returns:
    The datetime converted to the user's timezone and formatted as a string.
    """</span>
    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    user_tz = pytz.timezone(user_tz)

    <span class="hljs-comment"># Set the timezone to UTC (since the original time is in UTC)</span>
    dt = utc.localize(dt)

    <span class="hljs-comment"># Convert to user selected timezone</span>
    dt_user_tz = dt.astimezone(user_tz)

    <span class="hljs-comment"># Format the datetime in the desired way (day, date, time, and timezone)</span>
    formatted_time = dt_user_tz.strftime(<span class="hljs-string">"%a, %d %b %Y %H:%M:%S %Z"</span>)

    <span class="hljs-keyword">return</span> formatted_time
</code></pre>
<h3 id="heading-display-functions">Display Functions</h3>
<p>The <code>check_status</code>method returns a boolean status of the node. <code>OK</code> means the node is up and operational while <code>NO</code> means not operational.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_status</span>(<span class="hljs-params">status: bool</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Check the status of a Streamr node.

    Args:
    status: The status of the Streamr node.

    Returns:
    A string representing the status of the Streamr node.
    """</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">":green[OK]"</span> <span class="hljs-keyword">if</span> status <span class="hljs-keyword">else</span> <span class="hljs-string">":red[NO]"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708465294920/72face6e-6778-4b7c-9230-607140d80fa8.png" alt="Node status" class="image--center mx-auto" /></p>
<p><code>display_node_info()</code> function shows specific metrics about the Streamr node.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_node_info</span>(<span class="hljs-params">node_address: str, node_data: dict</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display information about a specific Streamr node.

    Args:
    node_address: The Ethereum address of the Streamr node.
    node_data: The data for the Streamr node.

    Returns: 
    None
    """</span>
    st.divider()
    col1, col2, col3 = st.columns(<span class="hljs-number">3</span>)
    col1.image(node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'identiconURL'</span>],
               caption=<span class="hljs-string">'Node Identicon'</span>)
    col2.metric(<span class="hljs-string">"Node Address"</span>, node_address[:<span class="hljs-number">4</span>] + <span class="hljs-string">"..."</span>)
    col1.markdown(
        <span class="hljs-string">f"Status: **<span class="hljs-subst">{check_status(node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'status'</span>])}</span>**"</span>)
    col3.metric(<span class="hljs-string">"Staked $DATA"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'staked'</span>])
    col2.metric(<span class="hljs-string">"To be Received"</span>, round(
        node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'toBeReceived'</span>], <span class="hljs-number">2</span>))
    col2.metric(<span class="hljs-string">"Total rewards"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'rewards'</span>])
    col3.metric(<span class="hljs-string">"Claim Count"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimCount'</span>])
    col3.metric(<span class="hljs-string">"Percentage of received claims %"</span>, round(
        node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimPercentage'</span>], <span class="hljs-number">2</span>))
</code></pre>
<p>When you run a Streamr Broker node, you periodically receive <code>reward codes</code>at random intervals. These reward codes then verify your node's activity and eligibility to receive rewards. Read more detailed info on <a target="_blank" href="https://docs.streamr.network/node-runners/mining-on-streamr/?ref=thedataengineerblog.com">Mining on Streamr</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708465296235/814c924a-29f7-4585-9462-4ed07e9bf3ea.jpeg" alt class="image--center mx-auto" /></p>
<p>The <code>display_latest_codes</code> function displays the latest reward codes received for the Streamr node.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_latest_codes</span>(<span class="hljs-params">node_data: dict, col: st.delta_generator.DeltaGenerator</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display the latest claimed reward codes for a Streamr node.

    Args:
    node_data: The data for the Streamr node.
    col: The Streamlit column to display the codes in.

    Returns:
    None
    """</span>
    all_timezones = pytz.all_timezones
    selected_tz = col.selectbox(<span class="hljs-string">"Select your timezone"</span>, 
                                all_timezones, 
                                index=all_timezones.index(<span class="hljs-string">'US/Eastern'</span>)
    )

    <span class="hljs-keyword">for</span> code <span class="hljs-keyword">in</span> node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimedRewardCodes'</span>]:
        formatted_time = convert_time_to_user_tz(code[<span class="hljs-string">'claimTime'</span>], 
                                                 selected_tz
        )
        col.write(<span class="hljs-string">f"<span class="hljs-subst">{code[<span class="hljs-string">'id'</span>]}</span> → <span class="hljs-subst">{formatted_time}</span>"</span>)
</code></pre>
<p><code>display_payouts</code> shows the historical payouts for the node; $DATA token rewards earned by running the node.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_payouts</span>(<span class="hljs-params">node_data: dict</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display the payouts for a Streamr node.

    Args:
    node_data: The data for the Streamr node.

    Returns:
    None
    """</span>
    <span class="hljs-comment"># Create placeholders for headers</span>
    st.divider()
    header1, header2 = st.columns(<span class="hljs-number">2</span>)

    header1.header(<span class="hljs-string">"Payouts"</span>)
    header2.header(<span class="hljs-string">"Latest codes"</span>)

    <span class="hljs-comment"># Create columns for the contents</span>
    cols = st.columns([<span class="hljs-number">4</span>, <span class="hljs-number">2</span>, <span class="hljs-number">12</span>])

    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    payouts = node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'payouts'</span>]
    payouts.reverse()
    <span class="hljs-keyword">for</span> payout <span class="hljs-keyword">in</span> payouts:
        <span class="hljs-comment"># Convert the timestamp to a datetime object</span>
        payout_time = datetime.utcfromtimestamp(int(payout[<span class="hljs-string">'timestamp'</span>]))
        <span class="hljs-comment"># Use convert_dt_to_user_tz() since payout_time is already a datetime object</span>
        formatted_time = convert_dt_to_user_tz(payout_time, <span class="hljs-string">'UTC'</span>)
        rounded_payout = math.ceil(float(payout[<span class="hljs-string">'value'</span>]))

        <span class="hljs-comment"># Use the first column for the text and the second for the SVG</span>
        cols[<span class="hljs-number">0</span>].markdown(<span class="hljs-string">f"<span class="hljs-subst">{formatted_time}</span> → <span class="hljs-subst">{rounded_payout}</span>"</span>)
        display_svg(cols[<span class="hljs-number">1</span>], <span class="hljs-string">"assets/data_token.svg"</span>, width=<span class="hljs-number">20</span>, height=<span class="hljs-number">20</span>)

    <span class="hljs-comment"># Display the latest codes in the third column</span>
    display_latest_codes(node_data, cols[<span class="hljs-number">2</span>])
    st.divider()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708465297355/bd8ee437-6780-4f16-addc-82540a5f95fb.jpeg" alt="Historical payouts" class="image--center mx-auto" /></p>
<blockquote>
<p>💡 I used the display_svg function to display the Streamr SVG image beside the payout information. If you are following this tutorial step-by-step, you must have the SVG image saved in your directory under the `assets/data_token.svg` folder.</p>
</blockquote>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_svg</span>(<span class="hljs-params">col: st.delta_generator.DeltaGenerator, path: str, width: Optional[int] = None, height: Optional[int] = None</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display an SVG image in a Streamlit column.

    Args:
        col: The Streamlit column to display the image in.
        path: The path to the SVG file.
        width: The width to resize the image to. If None, the original width of the image is used.
        height: The height to resize the image to. If None, the original height of the image is used.

    Returns:
    None
    """</span>
    <span class="hljs-comment"># Load the SVG file and convert it to a ReportLab Drawing</span>
    drawing = svg2rlg(path)

    <span class="hljs-comment"># Convert the Drawing to a PIL image</span>
    pil_image = renderPM.drawToPIL(drawing)

    <span class="hljs-comment"># Resize the image if width and height are provided</span>
    <span class="hljs-keyword">if</span> width <span class="hljs-keyword">and</span> height:
        pil_image = pil_image.resize((width, height))

    <span class="hljs-comment"># Convert the PIL image to an IO Bytes object so Streamlit can display it</span>
    image_stream = io.BytesIO()
    pil_image.save(image_stream, format=<span class="hljs-string">'PNG'</span>)
    pil_image = Image.open(image_stream)

    <span class="hljs-comment"># Display the image</span>
    col.image(pil_image, use_column_width=<span class="hljs-literal">False</span>)
</code></pre>
<h3 id="heading-final-step">Final Step</h3>
<p><strong>Nobody</strong>: …</p>
<p><strong>Python’s `if __name__ == “__main__”`</strong>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708465298693/cc1cbe90-fe78-47bc-989d-ed63b7f9d183.gif" alt class="image--center mx-auto" /></p>
<p>Time to glue everything together:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> concurrent.futures
<span class="hljs-keyword">import</span> io
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">import</span> math
<span class="hljs-keyword">import</span> re
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional

<span class="hljs-keyword">import</span> pytz
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> streamlit <span class="hljs-keyword">as</span> st
<span class="hljs-keyword">import</span> streamlit.components.v1 <span class="hljs-keyword">as</span> components
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> reportlab.graphics <span class="hljs-keyword">import</span> renderPM
<span class="hljs-keyword">from</span> svglib.svglib <span class="hljs-keyword">import</span> svg2rlg

<span class="hljs-keyword">import</span> config

<span class="hljs-comment"># Streamlit page config MUST be the first Streamlit command</span>
<span class="hljs-comment"># used in your app, and MUST only be set once</span>
st.set_page_config(
page_title=<span class="hljs-string">"Streamr Node Dashboard App"</span>,
page_icon=<span class="hljs-string">":lightning:"</span>,
layout=<span class="hljs-string">"wide"</span>,
initial_sidebar_state=<span class="hljs-string">"expanded"</span>,
menu_items={
    <span class="hljs-string">'Get help'</span>: <span class="hljs-string">'https://www.thedataengineerblog.com/'</span>,
    <span class="hljs-string">'About'</span>: <span class="hljs-string">"# This is a Streamlit clone version of the official Streamr BrubeckScan dashboard."</span>
    }
)

<span class="hljs-comment"># Set up logging</span>
logging.basicConfig(filename=<span class="hljs-string">'app.log'</span>, 
                    level=logging.INFO,
                    format=<span class="hljs-string">'%(asctime)s - %(levelname)s - %(message)s'</span>
)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_data</span>(<span class="hljs-params">endpoint: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch data from a given endpoint.

    Args:
    endpoint: The URL of the endpoint to fetch data from.

    Returns:
    The JSON response from the endpoint as a dictionary. Returns None if the request fails.
    """</span>
    <span class="hljs-keyword">try</span>:
        response = requests.get(endpoint)
        response.raise_for_status()
        <span class="hljs-keyword">return</span> response.json()
    <span class="hljs-keyword">except</span> requests.exceptions.RequestException <span class="hljs-keyword">as</span> e:
        logging.error(<span class="hljs-string">f"Request to <span class="hljs-subst">{endpoint}</span> failed: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_node_data</span>(<span class="hljs-params">node_address: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch data for a specific Streamr node.

    Args:
    node_address: The Ethereum address of the Streamr node.

    Returns:
    The data for the Streamr node as a dictionary. Returns None if the request fails.
    """</span>
    logging.info(<span class="hljs-string">f"Fetching node data for address <span class="hljs-subst">{node_address}</span>"</span>)
    <span class="hljs-keyword">return</span> fetch_data(<span class="hljs-string">f"<span class="hljs-subst">{config.API_BASE}</span>/nodes/<span class="hljs-subst">{node_address}</span>"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_metrics_data</span>(<span class="hljs-params">node_address: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Fetch metrics data for a specific Streamr node.

    Args:
    node_address: The Ethereum address of the Streamr node.

    Returns:
    The metrics data for the Streamr node as a dictionary. Returns None if any of the requests fail.
    """</span>
    logging.info(<span class="hljs-string">f"Getting metrics data for node <span class="hljs-subst">{node_address}</span>"</span>)
    data = {
        <span class="hljs-string">"acc_rewards"</span>: <span class="hljs-string">f"<span class="hljs-subst">{config.DATA_REWARDS_BASE}</span>/<span class="hljs-subst">{node_address}</span>"</span>,
        <span class="hljs-string">"claimed_rewards"</span>: <span class="hljs-string">f"<span class="hljs-subst">{config.CLAIMED_REWARDS_BASE}</span>/<span class="hljs-subst">{node_address}</span>"</span>,
        <span class="hljs-string">"apr_apy"</span>: config.APR_APY_BASE
    }

    <span class="hljs-keyword">with</span> concurrent.futures.ThreadPoolExecutor() <span class="hljs-keyword">as</span> executor:
        future_to_url = {executor.submit(
            fetch_data, url): key <span class="hljs-keyword">for</span> key, url <span class="hljs-keyword">in</span> data.items()}
        results = {future_to_url[future]: future.result(
            ) <span class="hljs-keyword">for</span> future <span class="hljs-keyword">in</span> concurrent.futures.as_completed(future_to_url)}

    <span class="hljs-comment"># Exclude any endpoints that failed to respond</span>
    <span class="hljs-keyword">return</span> {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> results.items() <span class="hljs-keyword">if</span> v <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>}

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert_time_to_user_tz</span>(<span class="hljs-params">time_str: str, user_tz: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Convert a time string to a given timezone and format it.

    Args:
    time_str: The time string to convert. It should be in ISO 8601 format (i.e., "YYYY-MM-DDTHH:MM:SS.sssZ").
    user_tz: The timezone to convert the time to.

    Returns:
    The time converted to the user's timezone and formatted as a string.
    """</span>
    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    user_tz = pytz.timezone(user_tz)

    <span class="hljs-comment"># Convert the string to a datetime object</span>
    dt = datetime.strptime(time_str, <span class="hljs-string">"%Y-%m-%dT%H:%M:%S.%fZ"</span>)

    <span class="hljs-comment"># Set the timezone to UTC (since the original time is in UTC)</span>
    dt = utc.localize(dt)

    <span class="hljs-comment"># Convert to user selected timezone</span>
    dt_user_tz = dt.astimezone(user_tz)

    <span class="hljs-comment"># Format the time in the desired way (12-hour time)</span>
    formatted_time = dt_user_tz.strftime(<span class="hljs-string">"%I:%M:%S %p"</span>)

    <span class="hljs-keyword">return</span> formatted_time

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert_dt_to_user_tz</span>(<span class="hljs-params">dt: datetime, user_tz: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Convert a datetime object to a given timezone and format it.

    Args:
    dt: The datetime object to convert. It should be naive (i.e., timezone-unaware).
    user_tz: The timezone to convert the datetime to.

    Returns:
    The datetime converted to the user's timezone and formatted as a string.
    """</span>
    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    user_tz = pytz.timezone(user_tz)

    <span class="hljs-comment"># Set the timezone to UTC (since the original time is in UTC)</span>
    dt = utc.localize(dt)

    <span class="hljs-comment"># Convert to user selected timezone</span>
    dt_user_tz = dt.astimezone(user_tz)

    <span class="hljs-comment"># Format the datetime in the desired way (day, date, time, and timezone)</span>
    formatted_time = dt_user_tz.strftime(<span class="hljs-string">"%a, %d %b %Y %H:%M:%S %Z"</span>)

    <span class="hljs-keyword">return</span> formatted_time

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_status</span>(<span class="hljs-params">status: bool</span>) -&gt; str:</span>
    <span class="hljs-string">"""
    Check the status of a Streamr node.

    Args:
    status: The status of the Streamr node.

    Returns:
    A string representing the status of the Streamr node.
    """</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">":green[OK]"</span> <span class="hljs-keyword">if</span> status <span class="hljs-keyword">else</span> <span class="hljs-string">":red[NO]"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_node_info</span>(<span class="hljs-params">node_address: str, node_data: dict</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display information about a specific Streamr node.

    Args:
        node_address: The Ethereum address of the Streamr node.
        node_data: The data for the Streamr node.

    Returns:
    None
    """</span>
    st.divider()
    col1, col2, col3 = st.columns(<span class="hljs-number">3</span>)
    col1.image(node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'identiconURL'</span>],
    caption=<span class="hljs-string">'Node Identicon'</span>)
    col2.metric(<span class="hljs-string">"Node Address"</span>, node_address[:<span class="hljs-number">4</span>] + <span class="hljs-string">"..."</span>)
    col1.markdown(
        <span class="hljs-string">f"Status: **<span class="hljs-subst">{check_status(node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'status'</span>])}</span>**"</span>)
    col3.metric(<span class="hljs-string">"Staked $DATA"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'staked'</span>])
    col2.metric(<span class="hljs-string">"To be Received"</span>, round(
        node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'toBeReceived'</span>], <span class="hljs-number">2</span>))
    col2.metric(<span class="hljs-string">"Total rewards"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'rewards'</span>])
    col3.metric(<span class="hljs-string">"Claim Count"</span>, node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimCount'</span>])
    col3.metric(<span class="hljs-string">"Percentage of received claims %"</span>, round(
        node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimPercentage'</span>], <span class="hljs-number">2</span>))

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_latest_codes</span>(<span class="hljs-params">node_data: dict, col: st.delta_generator.DeltaGenerator</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display the latest claimed reward codes for a Streamr node.

    Args:
    node_data: The data for the Streamr node.
    col: The Streamlit column to display the codes in.

    Returns:
    None
    """</span>
    all_timezones = pytz.all_timezones
    selected_tz = col.selectbox(
        <span class="hljs-string">"Select your timezone"</span>, all_timezones, index=all_timezones.index(<span class="hljs-string">'US/Eastern'</span>))

    <span class="hljs-keyword">for</span> code <span class="hljs-keyword">in</span> node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'claimedRewardCodes'</span>]:
        formatted_time = convert_time_to_user_tz(
            code[<span class="hljs-string">'claimTime'</span>], selected_tz)
        col.write(<span class="hljs-string">f"<span class="hljs-subst">{code[<span class="hljs-string">'id'</span>]}</span> → <span class="hljs-subst">{formatted_time}</span>"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_svg</span>(<span class="hljs-params">col: st.delta_generator.DeltaGenerator, path: str, width: Optional[int] = None, height: Optional[int] = None</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display an SVG image in a Streamlit column.

    Args:
    col: The Streamlit column to display the image in.
    path: The path to the SVG file.
    width: The width to resize the image to. If None, the original width of the image is used.
    height: The height to resize the image to. If None, the original height of the image is used.

    Returns:
    None
    """</span>
    <span class="hljs-comment"># Load the SVG file and convert it to a ReportLab Drawing</span>
    drawing = svg2rlg(path)

    <span class="hljs-comment"># Convert the Drawing to a PIL image</span>
    pil_image = renderPM.drawToPIL(drawing)

    <span class="hljs-comment"># Resize the image if width and height are provided</span>
    <span class="hljs-keyword">if</span> width <span class="hljs-keyword">and</span> height:
        pil_image = pil_image.resize((width, height))

    <span class="hljs-comment"># Convert the PIL image to an IO Bytes object so Streamlit can display it</span>
    image_stream = io.BytesIO()
    pil_image.save(image_stream, format=<span class="hljs-string">'PNG'</span>)
    pil_image = Image.open(image_stream)

    <span class="hljs-comment"># Display the image</span>
    col.image(pil_image, use_column_width=<span class="hljs-literal">False</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">display_payouts</span>(<span class="hljs-params">node_data: dict</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    Display the payouts for a Streamr node.

    Args:
    node_data: The data for the Streamr node.

    Returns:
    None
    """</span>
    <span class="hljs-comment"># Create placeholders for headers</span>
    st.divider()
    header1, header2 = st.columns(<span class="hljs-number">2</span>)

    header1.header(<span class="hljs-string">"Payouts"</span>)
    header2.header(<span class="hljs-string">"Latest codes"</span>)

    <span class="hljs-comment"># Create columns for the contents</span>
    cols = st.columns([<span class="hljs-number">4</span>, <span class="hljs-number">2</span>, <span class="hljs-number">12</span>])

    utc = pytz.timezone(<span class="hljs-string">'UTC'</span>)
    payouts = node_data[<span class="hljs-string">'data'</span>][<span class="hljs-string">'node'</span>][<span class="hljs-string">'payouts'</span>]
    payouts.reverse()
    <span class="hljs-keyword">for</span> payout <span class="hljs-keyword">in</span> payouts:
        <span class="hljs-comment"># Convert the timestamp to a datetime object</span>
        payout_time = datetime.utcfromtimestamp(int(payout[<span class="hljs-string">'timestamp'</span>]))
        <span class="hljs-comment"># Use convert_dt_to_user_tz() since payout_time is already a datetime object</span>
        formatted_time = convert_dt_to_user_tz(payout_time, <span class="hljs-string">'UTC'</span>)
        rounded_payout = math.ceil(float(payout[<span class="hljs-string">'value'</span>]))

        <span class="hljs-comment"># Use the first column for the text and the second for the SVG</span>
        cols[<span class="hljs-number">0</span>].markdown(<span class="hljs-string">f"<span class="hljs-subst">{formatted_time}</span> → <span class="hljs-subst">{rounded_payout}</span>"</span>)
        display_svg(cols[<span class="hljs-number">1</span>], <span class="hljs-string">"assets/data_token.svg"</span>, width=<span class="hljs-number">20</span>, height=<span class="hljs-number">20</span>)

        <span class="hljs-comment"># Display the latest codes in the third column</span>
        display_latest_codes(node_data, cols[<span class="hljs-number">2</span>])
        st.divider()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>() -&gt; <span class="hljs-keyword">None</span>:</span>
    <span class="hljs-string">"""
    The main function of the Streamlit app. 
    It asks the user for a Streamr node Ethereum address, fetches data for the node, and displays it.

    Returns:
    None
    """</span>
    st.title(<span class="hljs-string">"⚡ Streamr Node Dashboard App ⚡"</span>)
    node_address = st.text_input(
        <span class="hljs-string">"Enter a Streamr Node Ethereum address here"</span>, placeholder=<span class="hljs-string">"0x4a2A3501e50759250828ACd85E7450fb55A10a69"</span>, max_chars=<span class="hljs-number">42</span>)
    <span class="hljs-keyword">with</span> st.expander(<span class="hljs-string">'Copy the address in this expander and paste above for testing 🎉'</span>):
        st.code(<span class="hljs-string">'''0x4a2A3501e50759250828ACd85E7450fb55A10a69'''</span>)
    <span class="hljs-keyword">if</span> node_address:
        logging.info(<span class="hljs-string">f"Processing node address <span class="hljs-subst">{node_address}</span>"</span>)
        <span class="hljs-keyword">if</span> re.match(<span class="hljs-string">"^0x[a-fA-F0-9]{40}$"</span>, node_address):
            node_data = fetch_node_data(node_address)
            <span class="hljs-keyword">if</span> node_data <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span> <span class="hljs-keyword">and</span> <span class="hljs-string">'data'</span> <span class="hljs-keyword">in</span> node_data <span class="hljs-keyword">and</span> <span class="hljs-string">'node'</span> <span class="hljs-keyword">in</span> node_data[<span class="hljs-string">'data'</span>]:
                get_metrics_data(node_address)
                display_node_info(node_address, node_data)
                display_payouts(node_data)
            <span class="hljs-keyword">else</span>:
                logging.error(
                <span class="hljs-string">f"Failed to fetch data for address <span class="hljs-subst">{node_address}</span>. Please make sure it is a valid Streamr node address."</span>)
                st.error(
                <span class="hljs-string">"Failed to fetch data for the given Ethereum address. Please make sure it is a valid Streamr node address."</span>)
        <span class="hljs-keyword">else</span>:
        logging.error(
        <span class="hljs-string">f"Invalid Ethereum address: <span class="hljs-subst">{node_address}</span>. It should be 42 characters long (including '0x') and hexadecimal."</span>)
        st.error(
        <span class="hljs-string">"Invalid Ethereum address. It should be 42 characters long (including '0x') and hexadecimal."</span>)
    <span class="hljs-keyword">else</span>:
        logging.warning(
            <span class="hljs-string">"No Streamr node Ethereum address provided..."</span>)
        st.warning(
            <span class="hljs-string">"Please enter a Streamr node Ethereum address to fetch data..."</span>)

    st.markdown(<span class="hljs-string">"🔗 **Useful Links**"</span>)
    st.markdown(<span class="hljs-string">"- [Streamr Network](https://streamr.network/)"</span>)
    st.markdown(<span class="hljs-string">"- [Streamr Hub](https://streamr.network/projects)"</span>)
    st.markdown(<span class="hljs-string">"- [Earn $DATA](https://frens.streamr.network/intro)"</span>)
    st.markdown(<span class="hljs-string">"- [Streamr Twitter](https://twitter.com/streamr)"</span>)
    st.markdown(
        <span class="hljs-string">"💡 **Remember:** Keep building and shipping for a robust decentralized data economy!"</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    main()
</code></pre>
<blockquote>
<p><strong><em>Congratulations, you’ve created a stunning dashboard! 🎉</em></strong></p>
</blockquote>
<iframe src="https://streamr.streamlit.app/?embed=true" width="560" height="630"></iframe>

<p>Feel free to clone the full codebase in the <a target="_blank" href="https://github.com/tonykipkemboi/StreamrDashboard.git?ref=thedataengineerblog.com">repo</a>.</p>
<p>If you run into any issues don’t hesitate to reach out to me or post in comments!</p>
<p>This article was originally published on <a target="_blank" href="https://blog.streamr.network/build-a-streamr-node-dashboard-with-streamlit/">Streamr Blog</a> on June 13, 2023.</p>
]]></content:encoded></item><item><title><![CDATA[The 4 Common Data Formats in Data Engineering]]></title><description><![CDATA[Introduction
Choosing the right data format is an integral part of data engineering. The decision significantly influences data storage, processing speed, and interoperability.
This article dissects four popular data formats: CSV, JSON, Parquet, and ...]]></description><link>https://thedataengineerblog.com/the-4-common-data-formats-in-data-engineering-e42917729af8</link><guid isPermaLink="true">https://thedataengineerblog.com/the-4-common-data-formats-in-data-engineering-e42917729af8</guid><category><![CDATA[Data Science]]></category><category><![CDATA[data structures]]></category><category><![CDATA[data]]></category><category><![CDATA[csv]]></category><category><![CDATA[Apache Avro]]></category><category><![CDATA[json]]></category><category><![CDATA[Parquet]]></category><category><![CDATA[data-engineering]]></category><category><![CDATA[Python]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Tue, 30 May 2023 13:01:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/hXrPSgGFpqQ/upload/143dcf857977dc3bcc1eab3b4ede0530.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-introduction">Introduction</h3>
<p>Choosing the right data format is an integral part of data engineering. The decision significantly influences data storage, processing speed, and interoperability.</p>
<p>This article dissects four popular data formats: <em>CSV, JSON, Parquet</em>, and <em>Avro</em>, each with unique strengths and ideal use cases. It also includes Python code snippets demonstrating reading and writing in each file format.</p>
<h3 id="heading-csv-comma-separated-values">CSV (Comma-Separated Values)</h3>
<p>CSV is a simple file format that organizes data into tabular form. Each line in a CSV file represents a record, and commas separate individual fields.</p>
<p><strong>Structure:</strong> CSV files represent data in a tabular format. Each row corresponds to a data record, and each column represents a data field.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>CSV files are simple, lightweight, and human-readable.</p>
</li>
<li><p>They are broadly supported across platforms and programming languages.</p>
</li>
<li><p>Parsing CSV files is straightforward due to their simple structure.</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>CSV files lack a standard schema, leading to potential inconsistencies in data interpretation.</p>
</li>
<li><p>They do not support complex data types or hierarchical or relational data.</p>
</li>
<li><p>They are inefficient for large datasets due to slower read/write speeds.</p>
</li>
</ul>
<p><strong>Use Cases:</strong> CSV is a practical choice for simple, flat data structures and smaller datasets where human readability is crucial.</p>
<p><strong>Fun Fact:</strong> CSV was first supported by IBM Fortran in 1972, largely because it was easier to type CSV lists on punched cards​.</p>
<p><strong>Reading CSV:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

data = pd.read_csv(<span class="hljs-string">'data.csv'</span>)
print(data.head())
</code></pre>
<p><strong>Writing CSV:</strong></p>
<pre><code class="lang-python">data.to_csv(<span class="hljs-string">'new_data.csv'</span>, index=<span class="hljs-literal">False</span>)
</code></pre>
<p>JSON is a data-interchange format that uses human-readable text to store and transmit objects comprising attribute-value pairs and array data types.</p>
<p><strong>Structure:</strong> JSON data is represented as key-value pairs and supports complex nested structures. It allows the use of arrays and objects, enabling a flexible and dynamic schema.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>JSON files support complex data structures, including nested objects and arrays.</p>
</li>
<li><p>They are language-independent, interoperable, and widely used in web APIs due to their compatibility with JavaScript.</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>JSON files can be inefficient for large datasets because of their verbose nature and repeated field names.</p>
</li>
<li><p>They are not ideal for binary data storage.</p>
</li>
</ul>
<p><strong>Use Cases:</strong> JSON is the go-to data format for data interchange between web applications and APIs, especially when dealing with complex data structures.</p>
<p><strong>Fun Fact:</strong> Douglas Crockford and Chip Morningstar sent the first JSON message in <a target="_blank" href="https://en.wikipedia.org/wiki/JSON?ref=thedataengineerblog.com">April 2001</a>​.</p>
<p><strong>Reading JSON:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json

<span class="hljs-keyword">with</span> open(<span class="hljs-string">'data.json'</span>) <span class="hljs-keyword">as</span> f:
    data = json.load(f)
    print(data)
</code></pre>
<p><strong>Writing JSON:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">with</span> open(<span class="hljs-string">'new_data.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
    json.dump(data, f)
</code></pre>
<h3 id="heading-parquet">Parquet</h3>
<p>Apache Parquet is a columnar storage file format available to any project in the Hadoop ecosystem.</p>
<p><strong>Structure:</strong> Parquet arranges data by columns, allowing efficient read operations on a subset of the columns. It offers advanced data compression and encoding schemes.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Parquet files offer efficient disk I/O and are suitable for query performance due to their columnar storage.</p>
</li>
<li><p>They support complex nested data structures and offer high compression, reducing storage costs.</p>
</li>
<li><p>They are compatible with many data processing frameworks, such as Apache Hadoop, Apache Spark, and Google BigQuery.</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Parquet files are not human-readable.</p>
</li>
<li><p>They have slower write operations due to compressing and encoding data overhead.</p>
</li>
</ul>
<p><strong>Use Cases:</strong> Parquet is the preferred choice for analytical queries and big data operations, where efficient columnar reads are more crucial than write performance.</p>
<p><strong>Fun Fact:</strong> Parquet was designed to improve the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The first version, Apache Parquet 1.0, was released in <a target="_blank" href="https://en.wikipedia.org/wiki/Apache_Parquet">July 2013​</a>.</p>
<p><strong>Reading Parquet (requires the</strong><code>pyarrow</code><strong>or</strong><code>fastparquet</code><strong>library):</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

data = pd.read_parquet(<span class="hljs-string">'data.parquet'</span>)
print(data.head())
</code></pre>
<p><strong>Writing Parquet:</strong></p>
<pre><code class="lang-python">data.to_parquet(<span class="hljs-string">'new_data.parquet'</span>)
</code></pre>
<h3 id="heading-avro">Avro</h3>
<p>Apache Avro is a row-based storage format designed for data serialization in big data applications.</p>
<p><strong>Structure:</strong> Avro stores data definition in JSON format and data in binary format, facilitating compact, fast binary serialization and deserialization. It also supports schema evolution.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Avro files provide a compact binary data format that supports schema evolution.</p>
</li>
<li><p>They offer fast read/write operations, making them suitable for real-time processing.</p>
</li>
<li><p>They are widely used with Kafka and Hadoop for data serialization.</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Avro files are not human-readable.</p>
</li>
<li><p>They require the schema to read the data, adding a layer of complexity.</p>
</li>
</ul>
<p><strong>Use Cases:</strong> Avro is the optimal choice for big data applications requiring fast serialization/deserialization and systems needing schema evolution’s flexibility.</p>
<p><strong>Fun Fact:</strong> Avro was developed by the creator of Apache Hadoop, Doug Cutting, specifically to address big data challenges. The initial release of Avro was on <a target="_blank" href="https://en.wikipedia.org/wiki/Apache_Avro">November 2, 2009​</a>.</p>
<p><strong>Reading Avro (requires the</strong><code>avro</code><strong>library):</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> avro.datafile <span class="hljs-keyword">import</span> DataFileReaderfrom avro.io 
<span class="hljs-keyword">import</span> DatumReader

<span class="hljs-keyword">with</span> DataFileReader(open(<span class="hljs-string">"data.avro"</span>, <span class="hljs-string">"rb"</span>), DatumReader()) <span class="hljs-keyword">as</span> reader:
    <span class="hljs-keyword">for</span> record <span class="hljs-keyword">in</span> reader:
        print(record)
</code></pre>
<p><strong>Writing Avro:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> avro.datafile <span class="hljs-keyword">import</span> DataFileWriterfrom avro.io 
<span class="hljs-keyword">import</span> DatumWriterfrom avro.schema 
<span class="hljs-keyword">import</span> parse

<span class="hljs-comment"># you need a schema to write Avro</span>
schema = parse(open(<span class="hljs-string">"data_schema.avsc"</span>, <span class="hljs-string">"rb"</span>).read()) 

<span class="hljs-keyword">with</span> DataFileWriter(open(<span class="hljs-string">"new_data.avro"</span>, <span class="hljs-string">"wb"</span>), DatumWriter(), schema) <span class="hljs-keyword">as</span> writer:
    writer.append({<span class="hljs-string">"name"</span>: <span class="hljs-string">"test"</span>, <span class="hljs-string">"favorite_number"</span>: <span class="hljs-number">7</span>, <span class="hljs-string">"favorite_color"</span>: <span class="hljs-string">"red"</span>})
</code></pre>
<p>**Note:**<em>For Avro, you need a schema to write data. The schema is a JSON object that defines the data structure.</em></p>
<h3 id="heading-choosing-the-right-format">Choosing the Right Format</h3>
<p>The decision to select a data format isn’t a one-size-fits-all situation. It largely depends on several factors, such as the nature and volume of your data, the type of operations you’ll perform, and the storage capacity.</p>
<p>While CSV and JSON are excellent for simplicity and interoperability, Parquet and Avro stand out when dealing with big data due to their read operations and serialization efficiencies.</p>
<p>To make a well-informed decision:</p>
<ul>
<li><p><strong>Evaluate the structure of your data</strong>: Is it flat or nested? Simple or complex?</p>
</li>
<li><p><strong>Consider the volume of data</strong>: Large datasets may require efficient formats like Parquet or Avro.</p>
</li>
<li><p><strong>Think about the operations</strong>: Are you performing more read operations or write operations? Do you need real-time processing?</p>
</li>
<li><p><strong>Consider the storage</strong>: Columnar formats like Parquet offers high compression, reducing storage costs.</p>
</li>
<li><p><strong>Consider interoperability</strong>: Do you need to share this data with other systems?</p>
</li>
</ul>
<h3 id="heading-conclusion">Conclusion</h3>
<p>Understanding data formats and their strengths is important in the data engineering process. Whether you choose CSV, JSON, Parquet, or Avro, it’s about picking the right tool for your specific use case. As a data engineer, your role is to balance the trade-offs and choose the format that best serves your data, performance requirements, and business needs.</p>
<p>I hope this deep dive into CSV, JSON, Parquet, and Avro will guide you in your data format selection process.</p>
<p>Stay tuned for more technical content, and don’t forget to subscribe to receive updates when it ships!</p>
<h3 id="heading-further-reading">Further reading</h3>
<ol>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/Comma-separated_values?ref=thedataengineerblog.com">CSV</a></p>
</li>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/JSON?ref=thedataengineerblog.com">JSON</a></p>
</li>
<li><p><a target="_blank" href="https://parquet.apache.org/?ref=thedataengineerblog.com">Apache Parquet</a></p>
</li>
<li><p><a target="_blank" href="https://avro.apache.org/?ref=thedataengineerblog.com">Avro</a></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[I am Leaving Data Engineering for Developer Relations]]></title><description><![CDATA[I am stepping away from my Data Engineering role at Booz Allen Hamilton and joining Snowflake as a Developer Relations Associate working on Streamlit.
I am humbled and excited to have the opportunity to work with an open-source tool that I am passion...]]></description><link>https://thedataengineerblog.com/i-am-leaving-data-engineering-for-developer-relations</link><guid isPermaLink="true">https://thedataengineerblog.com/i-am-leaving-data-engineering-for-developer-relations</guid><category><![CDATA[Developer]]></category><category><![CDATA[DevRel]]></category><category><![CDATA[Python]]></category><category><![CDATA[streamlit]]></category><category><![CDATA[snowflake]]></category><category><![CDATA[data-engineering]]></category><category><![CDATA[marketing]]></category><category><![CDATA[product]]></category><category><![CDATA[engineering]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Wed, 09 Nov 2022 12:45:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1667866006424/WEJfLgnYB.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I am stepping away from my Data Engineering role at <a target="_blank" href="https://www.boozallen.com/">Booz Allen Hamilton</a> and joining <a target="_blank" href="https://www.snowflake.com/en/">Snowflake</a> as a Developer Relations Associate working on Streamlit.</p>
<p>I am humbled and excited to have the opportunity to work with an open-source tool that I am passionate about and a great team.</p>
<p>First, let me take you back to where it all started👇🏿</p>
<h2 id="heading-my-time-in-the-military">My Time in the Military</h2>
<p>I served in the U.S. Army for seven years before transitioning out in September 2021. Apart from being a soldier, my <a target="_blank" href="https://www.goarmy.com/careers-and-jobs/career-match/science-medicine/research/68k-medical-laboratory-specialist.html">Military Occupational Specialty</a> (a.k.a. MOS)—think job title—was a Medical Laboratory Technician.</p>
<p><a target="_blank" href="https://labscientists.wordpress.com/medical-laboratory-humor/"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667511162751/njwEhr4Yn.jpg" alt="med-tech_new-grad.jpg" class="image--center mx-auto" /></a></p>
<p>In my first four years of service, I worked at a military health center with daily tasks; checking-in patients, making blood smears, streak-plating bacterial cultures, packaging samples for shipment, and phlebotomy.</p>
<p>My final three years were at a military research institute (<a target="_blank" href="https://usamriid.health.mil/">USAMRIID</a>). It is during my time here that I found my interest in coding. I worked in Genomics for my assignment and got the chance to interface with Bioinformaticians using Python to analyze the genomic data that I and other research assistants generated from the wet lab. I will save the granular details for another article.</p>
<p>Serving in the military gave me many wonderful experiences and an education to be appreciated. The valuable experiences I acquired while serving will stay with me forever.</p>
<h2 id="heading-transition-to-private-sector">Transition to Private Sector</h2>
<p>I knew it was time to transition out of the military early in my last tour of service. I remember feeling very stressed and lost between late 2019 and April 2020. The military is all I had known for seven years, and that was my first job.</p>
<p><a target="_blank" href="https://veteranlife.com/lifestyle/veteran-memes/"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667510156719/Ejt0Yssve.jpg" alt="mos-skill-translator.jpg" class="image--center mx-auto" /></a></p>
<p>I entertained the idea of pursuing medicine as a career after service since that was my goal for a long time. After shadowing several doctors, I decided that was not for me anymore.</p>
<p>I gave tech a chance while working in Genomics and interacting with the Bioinformaticians who urged me to continue learning and even gave me simple automation project ideas I could implement at work, motivating me to learn Python. A few months of self-teaching went by, and after applying to grad school, I got my acceptance to the University of Pennsylvania’s <a target="_blank" href="https://online.seas.upenn.edu/degrees/mcit-online/">Masters in Computer and Information Technology</a>.</p>
<p>Six months before my transition, I had the opportunity to intern at <a target="_blank" href="https://www.merck.com/">Merck &amp; Co.</a> as a Data Engineer. I had a fantastic supervisor who was very supportive and still is to this date. During my internship, I was interviewing for a role at <a target="_blank" href="https://www.bloomberg.com/company/">Bloomberg L.P.</a>, where I got an offer and had to relocate to the New York City area. I could not have found a better landing coming out of the military.</p>
<p>My time at Bloomberg was pivotal to my journey, and I need a dedicated article to cover all the highlights. Working at Bloomberg was my first tech role and first private sector job. I made great friends and learned much about the financial markets and data. Due to family and personal reasons, we had to relocate out of the area; tl;dr raising a child in the city is not easy for us folks used to open spaces and quietness.</p>
<p>I found my next role with Booz Allen Hamilton as a Data Engineer. Working in the government consulting space was a little familiar from my time in service, and like anything in the government, things moved a little slower compared to the private sector. I enjoyed the teamwork and the flexibility to learn new technologies like cloud services.</p>
<h2 id="heading-why-pivot-to-developer-relations">Why Pivot to Developer Relations?</h2>
<p>Up to this point in my tech career, I have focused on continuous learning by building side projects on top of my day job. In late 2021, I started a YouTube channel where I create blockchain and python tutorials. I have since started writing articles and taking speaking engagements to share my journey and coding workshops.</p>
<p>Honestly, I have felt happier letting my thoughts pour out in written and video format. I enjoy teaching and aspire to break down technical concepts into digestible bits, especially for newbies.</p>
<p>What is Developer Relations (”DevRel”)? You might wonder. DevRel is an umbrella with different roles within it, such as:</p>
<ul>
<li><p>Developer Experience (DevX)</p>
</li>
<li><p>Developer Advocate</p>
</li>
<li><p>Developer Evangelist</p>
</li>
<li><p>Developer Marketing</p>
</li>
</ul>
<p>From the roles above, you can deduce that DevRel is a multifaceted role sitting at the intersection of engineering, product, and marketing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667506775079/-SDwDVkVh.png" alt="devrel_venn_1.png" /></p>
<p>DevRel scope of responsibility vary by company, but you can expect to have work on a combination of writing documentation and articles, building tools and code samples, video content production, speaking at conferences, and helping developer with blockers they encounter via channels like Stack Overflow and Reddit to name a few. DevRel enables organizations to collect user input, creating a feedback loop mechanism that helps improve the product for the users.</p>
<h2 id="heading-why-streamlit-at-snowflake">Why Streamlit🎈 at Snowflake❄️?</h2>
<p>TL;DR: Streamlit;</p>
<ul>
<li><p>It is an open-source app Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.</p>
</li>
<li><p>It turns data scripts into shareable web apps in minutes.</p>
</li>
<li><p>All in pure Python.</p>
</li>
<li><p>No front-end experience is required!</p>
</li>
</ul>
<iframe width="781" height="438" src="https://s3-us-west-2.amazonaws.com/assets.streamlit.io/videos/hero-video.mp4"></iframe>

<p>*Streamlit video demo: https://streamlit.io/*</p>
<p>In this video, <a target="_blank" href="https://www.linkedin.com/in/adrien-treuille-52215718/">Adrien Treuille</a>, Head of Streamlit at Snowflake, demonstrates how to build interactive apps in Snowflake using Streamlit.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.youtube.com/watch?v=e8kZQDKeNwk">https://www.youtube.com/watch?v=e8kZQDKeNwk</a></div>
<p> </p>
<p>When I started learning Python in 2020, I found that I needed to make my projects interactive and user-facing. Naturally, I tried using Flask and Django, but it was a bit painful as a newbie. I stumbled upon Streamlit, and since then, it has been my go-to module. Here are some of the apps I have built using Streamlit;</p>
<ul>
<li><p><a target="_blank" href="https://tonykipkemboi-scrapetwitter-appdriver-qevmgf.streamlit.app/">Twitter Scraper using “snscrape” Module</a>: A user enters a phrase in the search box, the number of records to return, and a date range. There’s an option to download the dataset to a CSV file.</p>
</li>
<li><p><a target="_blank" href="https://tonykipkemboi-mvrvdashboardapp-app-zgy2ml.streamlitapp.com/">MVRV (Market Value to Realized Value) Dashboard App</a>: This app I built during a web3 hackathon a few months ago. I used <a target="_blank" href="https://glassnode.com/">Glassnode’s API</a> to get the data and display it on the app. I won a bounty for the app! The app shows the MVRV ratio (tl;dr the ratio tells us if the price of a token/stock is fair or not) of Bitcoin, Ethereum, or Litecoin.</p>
</li>
<li><p><a target="_blank" href="https://tonykipkemboi-sentimentanalysisapp-streamlit-app-i5a9o9.streamlitapp.com/">Yelp Reviews Sentiment Analysis WebApp</a>: One of the first Streamlit apps I built. The app scrapes Yelp reviews and scores the sentiment using Hugging Faces’ BERT model.</p>
</li>
</ul>
<p>My MVRV app was featured in Streamlit Weekly Roundup in March and got fantastic stickers and a hand-written letter; these hardly come by these days!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667864527943/tkjjYq1mU.jpg" alt="streamlit_app.jpg" /></p>
<p>Considering my career path and interests, being a DevRel for Streamlit at Snowflake is an exciting and obvious decision.</p>
<p>I am excited to work alongside a talented team that includes one of my favorite data content creators <a target="_blank" href="https://twitter.com/thedataprof">Chanin Nantasenamat, a.k.a. “The Data Professor</a>.” I recommend you subscribe to his <a target="_blank" href="https://www.youtube.com/dataprofessor">YouTube</a> channel.</p>
<p>If you are new to Python or a seasoned developer looking to start with Streamlit, I recommend you start <a target="_blank" href="https://docs.streamlit.io/library/get-started">here</a>. Please don’t hesitate to <a target="_blank" href="https://twitter.com/_townee">reach out and connect with me anytime</a>.</p>
<p><em>Happy Streamlit-ing🎈</em></p>
]]></content:encoded></item><item><title><![CDATA[How to Query The Graph Protocol for Onchain Data using Python]]></title><description><![CDATA[In this tutorial, we will query the ENS Subgraph using two methods; raw GraphQL query and Subgrounds library by Playgrounds.
The goal is for you to be able to:

query any Subgraph data using Python

understand the two querying methods



What are Sub...]]></description><link>https://thedataengineerblog.com/how-to-query-the-graph-protocol-for-onchain-data-using-python</link><guid isPermaLink="true">https://thedataengineerblog.com/how-to-query-the-graph-protocol-for-onchain-data-using-python</guid><category><![CDATA[Web3]]></category><category><![CDATA[Blockchain]]></category><category><![CDATA[GraphQL]]></category><category><![CDATA[Python]]></category><category><![CDATA[#thegraph]]></category><category><![CDATA[web3.0]]></category><category><![CDATA[Ethereum]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[data-engineering]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Fri, 09 Sep 2022 10:30:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1662677533071/E7f0dP14x.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this tutorial, we will query the ENS Subgraph using two methods; raw GraphQL query and Subgrounds library by Playgrounds.</p>
<p>The goal is for you to be able to:</p>
<ul>
<li><p>query any Subgraph data using Python</p>
</li>
<li><p>understand the two querying methods</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662672090296/i-UaTPbFD.png" alt="subgraph_query.png" /></p>
<h2 id="heading-what-are-subgraphs-anyway">What are Subgraphs anyway?</h2>
<p>The TL;DR definition of <a target="_blank" href="https://thegraph.com/en/">The Graph</a> protocol is described by <a target="_blank" href="https://www.tegankline.com/about">Tegan Kline</a>, a Co-Founder of The Graph protocol, in her tweet;</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662672196937/4_Pe1lPhQ.png" alt="what_is_The_Graph.png" /></p>
<p>The Graph is a decentralized indexing protocol that allows developers to build open-source APIs called subgraphs to query networks like Ethereum, IPFS, and other supported chains. There are 522, and growing, subgraphs deployed at the time of writing. Anyone can query subgraphs for on-chain data, as we will in this tutorial.</p>
<p>Now let’s get querying!</p>
<h2 id="heading-tech-stack">Tech Stack</h2>
<ul>
<li><p>Coding Language: Python</p>
</li>
<li><p>IDE/Coding Platform: Jupyter Notebook (Anaconda)</p>
</li>
<li><p>GraphQL: open-source data query language(”QL” part of GraphQL) used for APIs</p>
</li>
<li><p>Blockchain API: Subgraph(ENS Subgraph for demo purposes)</p>
</li>
</ul>
<h2 id="heading-getting-started">Getting Started</h2>
<p><strong>Requirements</strong>: To follow this tutorial, you will need <a target="_blank" href="https://www.python.org/downloads/">Python 3.10</a> and <a target="_blank" href="https://www.anaconda.com/">Anaconda</a> installed on your system.</p>
<h2 id="heading-using-raw-graphql-to-query-subgraphs-with-python">Using Raw GraphQL to Query Subgraphs with Python</h2>
<h3 id="heading-step-1-setup-coding-environment">Step 1: Setup coding environment</h3>
<p>Once you have installed Python and Anaconda, open your command line, create a folder, and change the directory into your folder:</p>
<pre><code class="lang-python">mkdir &lt;your_folder_name&gt; &amp;&amp; cd &lt;your_folder_name&gt;
</code></pre>
<p>Create a Python virtual environment to keep our project dependencies isolated:</p>
<pre><code class="lang-python">python -m venv env
</code></pre>
<p>Activate the virtual environment (env); you should see the name of the environment prefixed after successful activation as such,<code>(env) C:\</code> :</p>
<pre><code class="lang-python">.\env\Script\activate
</code></pre>
<p>Now that we have our environment up and ready, let’s install some libraries that our project will depend on for querying data:</p>
<pre><code class="lang-python">pip install pandas requests
</code></pre>
<p>To confirm you have the needed packages (pandas and requests), use pip to check:</p>
<pre><code class="lang-python">pip freeze
</code></pre>
<p>Since we will be using the virtual environment in Jupyter Notebook, we need to add it as such:</p>
<ul>
<li>install the <a target="_blank" href="https://github.com/ipython/ipykernel">ipykernel</a> package, which provides the IPython kernel for Jupyter:</li>
</ul>
<pre><code class="lang-python">  pip install --user ipykernel
</code></pre>
<ul>
<li>add virtual environment to Jupyter by typing:</li>
</ul>
<pre><code class="lang-python">  python -m ipykernel install --name=env
</code></pre>
<p>After running the command above, you should see something like this:</p>
<p><code>Installed kernelspec env in C:\ProgramData\jupyter\kernels\env</code></p>
<p>The final step of the setup is to open Jupyter Notebook; run this command:</p>
<pre><code class="lang-python">jupyter notebook
</code></pre>
<p>A tab will open in your browser with Jupyter on localhost.</p>
<p>Locate the “New” tab and choose <code>env</code> to open a notebook with your created virtual environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662673096335/0ICJ1yKKJ.png" alt="choose_env.png" /></p>
<p>Now we are ready to roll!</p>
<h3 id="heading-step-2-prepare-raw-query">Step 2: Prepare raw query</h3>
<p>For this tutorial, we will query the <a target="_blank" href="https://thegraph.com/hosted-service/subgraph/ensdomains/ens?selected=playground">ENS(Ethereum Name Service) Subgraph</a> for some fun data. <a target="_blank" href="https://ens.domains">ENS</a> is a naming system based on the Ethereum blockchain. The common use of ENS has been to map human-readable domain names like <code>vitalik.eth</code> to machine-readable Ethereum addresses. ENS is very popular in the Web3 space; hence the high activity of registrations as seen on <a target="_blank" href="https://etherscan.io/address/0x283af0b28c62c092c9727f1ee09c02ca627eb7f5">ENS Etherscan Registrar Controller</a>.</p>
<p>Let’s say we are interested in obtaining the latest data about registered domains to answer these questions;</p>
<ul>
<li><p>what are the latest domain names registered?</p>
</li>
<li><p>who are the registrants of the names (hexadecimal ETH addresses)?</p>
</li>
<li><p>when did the registration happen?</p>
</li>
<li><p>what was the cost of registration in ETH?</p>
</li>
<li><p>what is the expiry date of the domain names?</p>
</li>
</ul>
<p>The Graph protocol provides a <a target="_blank" href="https://thegraph.com/hosted-service/subgraph/ensdomains/ens">playground</a> and <a target="_blank" href="https://api.thegraph.com/subgraphs/name/ensdomains/ens/graphql?query=%0A++++%23%0A++++%23+Welcome+to+The+GraphiQL%0A++++%23%0A++++%23+GraphiQL+is+an+in-browser+tool+for+writing%2C+validating%2C+and%0A++++%23+testing+GraphQL+queries.%0A++++%23%0A++++%23+Type+queries+into+this+side+of+the+screen%2C+and+you+will+see+intelligent%0A++++%23+typeaheads+aware+of+the+current+GraphQL+type+schema+and+live+syntax+and%0A++++%23+validation+errors+highlighted+within+the+text.%0A++++%23%0A++++%23+GraphQL+queries+typically+start+with+a+%22%7B%22+character.+Lines+that+start%0A++++%23+with+a+%23+are+ignored.%0A++++%23%0A++++%23+An+example+GraphQL+query+might+look+like%3A%0A++++%23%0A++++%23+++++%7B%0A++++%23+++++++field%28arg%3A+%22value%22%29+%7B%0A++++%23+++++++++subField%0A++++%23+++++++%7D%0A++++%23+++++%7D%0A++++%23%0A++++%23+Keyboard+shortcuts%3A%0A++++%23%0A++++%23++Prettify+Query%3A++Shift-Ctrl-P+%28or+press+the+prettify+button+above%29%0A++++%23%0A++++%23+++++Merge+Query%3A++Shift-Ctrl-M+%28or+press+the+merge+button+above%29%0A++++%23%0A++++%23+++++++Run+Query%3A++Ctrl-Enter+%28or+press+the+play+button+above%29%0A++++%23%0A++++%23+++Auto+Complete%3A++Ctrl-Space+%28or+just+start+typing%29%0A++++%23%0A++">explorer</a> where anyone can write custom queries to pull data from any given subgraph.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662673275771/xwBDHhrMp.png" alt="ENS playground on The Graph website" /></p>
<p>The explorer is much easier to use because of the radio buttons compared to the playground, where you must manually type the queries.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662674239005/Hzy3ED1gV.png" alt="ENS explorer" /></p>
<p>We will play around with the explorer by selecting entities and querying data until we find a query that returns the data to answer our questions above; this is the final query:</p>
<pre><code class="lang-python">query ENSData {
  <span class="hljs-comment"># latest 1000 ENS registrations</span>
  registrations(first:<span class="hljs-number">1000</span>, orderBy:registrationDate, orderDirection:desc){
    domain{
      name <span class="hljs-comment"># like`vitalik.eth`</span>
    }
    registrant {
      id <span class="hljs-comment"># hexadecimal address</span>
    }
    registrationDate 
    cost 
    expiryDate 
  }
}
</code></pre>
<h3 id="heading-step-3-query-with-python-in-jupyter-notebook">Step 3: Query with Python in Jupyter Notebook</h3>
<p>It is time to use Python to get data using the query we prepared above.</p>
<p>In our Jupyter Notebook, created earlier, use the first cell to import the dependencies we will need;</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> requests
</code></pre>
<p>Now let’s save our query above in a variable and create a function that will handle sending the payload (query) to make an API call to the ENS subgraph and receive data:</p>
<pre><code class="lang-python"><span class="hljs-comment"># variable holding the query payload</span>
query = <span class="hljs-string">"""
{
    registrations(first:1000, orderBy:registrationDate, orderDirection:desc){
        domain{
            name
        }
        registrant {
            id
        }
        registrationDate
        cost
        expiryDate
    }
}
"""</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_data</span>(<span class="hljs-params">query</span>):</span>
    <span class="hljs-string">"""This function posts a request to make an API call to ENS Subgraph URL
    parameters:
    ------------
    query: payload containing specific data we need
    return:
    -------
    response.json(): queried data in JSON format
    """</span>

    response = requests.post(<span class="hljs-string">'https://api.thegraph.com/subgraphs/name/ensdomains/ens'</span>
                             <span class="hljs-string">''</span>,
                             json={<span class="hljs-string">"query"</span>:query})

    <span class="hljs-keyword">if</span> response.status_code == <span class="hljs-number">200</span>: <span class="hljs-comment"># code 200 means no errors </span>
        <span class="hljs-keyword">return</span> response.json()
    <span class="hljs-keyword">else</span>: <span class="hljs-comment"># if errors, print the error code for debugging</span>
        <span class="hljs-keyword">raise</span> Exception(<span class="hljs-string">"Query failed with return code {}"</span>.format(response.staus_code))
</code></pre>
<p>Make sure you run each cell up to this point using the run button on the Notebook or <code>Ctrl + Enter</code> on the keyboard. The final step to get data is to invoke the function and pass in the query:</p>
<pre><code class="lang-python">data = get_data(query)
display(data)
</code></pre>
<p>Your output from running the function will be in nested JSON + list format.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662674331571/evMgtKtZx.png" alt="JSON data output" /></p>
<p>And voilà, we have queried the latest ENS registration data that we need to answer our initial questions!</p>
<p>The next steps you can take would be to flatten the data, load it into pandas DataFrame, and clean up the data. The Epoch Times (<code>registrationDate</code> and <code>expiryDate</code>) would need to be converted to <a target="_blank" href="https://en.wikipedia.org/wiki/Unix_time">Unix Timestamp</a>, which is more human-readable. The same goes for converting <code>cost</code> from <code>Wei</code> to <code>ETH.</code></p>
<p>Now that we have queried data the 'more manual' way, let’s look at an 'easier' way of doing the same but with more robust advantages using Subgrounds.</p>
<h2 id="heading-using-subgrounds-python-library-to-query-subgraphs">Using Subgrounds Python Library to Query Subgraphs</h2>
<h3 id="heading-what-is-subgrounds">What is Subgrounds?</h3>
<p>Subgrounds is an open-source data access layer for querying, manipulating, and visualizing subgraph data. The library makes it easy for data professionals—or anyone—to access on-chain data in a familiar web2 stack. These are some of the highlighted benefits of using subgrounds;</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662732331322/RBZbs8lMg.png" alt="play.png" /></p>
<h3 id="heading-step-1-import-and-initialize-subgrounds-in-jupyter">Step 1: Import and initialize subgrounds in Jupyter</h3>
<p>In a new Jupyter Notebook cell below the last code above, import subgrounds and run the cell:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> subgrounds <span class="hljs-keyword">import</span> Subgrounds
</code></pre>
<p>Initialize Subgrounds:</p>
<pre><code class="lang-python">sg = Subgrounds()
</code></pre>
<h3 id="heading-step-2-load-ens-subgraph-url-and-create-field-paths">Step 2: Load ENS subgraph URL and create Field Paths</h3>
<p>Load the ENS subgraph using its API URL, which you can find it here:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662674482326/cIBT9pQWG.png" alt="http.png" /></p>
<pre><code class="lang-python"><span class="hljs-comment"># load ENS subgraph </span>
ens = sg.load_subgraph(<span class="hljs-string">'https://api.thegraph.com/subgraphs/name/ensdomains/ens'</span>)
</code></pre>
<p>Subgrounds provides options for getting data in different formats; <code>query</code>, <code>query_df</code>, and <code>query_json</code>. Since we need to do some analysis with our data, we will choose to have our data in a pandas DataFrame using <code>query_df.</code></p>
<p>We will also have our data normalized by the library and use the <code>SyntheticFields</code> function to define a human-readable timestamp format transformation before querying. Let’s import the SyntheticField method from subgrounds and define synthetic fields for both <code>registrationDate</code> and <code>expiryDate</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> subgrounds.subgraph <span class="hljs-keyword">import</span> SyntheticField

<span class="hljs-comment"># registrationdate synthetic field</span>
ens.Registration.registrationdate = SyntheticField(
    <span class="hljs-keyword">lambda</span> registrationDate: str(datetime.fromtimestamp(registrationDate)),
    SyntheticField.STRING,
    ens.Registration.registrationDate
)

<span class="hljs-comment"># expirydate synthetic field</span>
ens.Registration.expirydate = SyntheticField(
    <span class="hljs-keyword">lambda</span> expiryDate: str(datetime.fromtimestamp(expiryDate)),
    SyntheticField.STRING,
    ens.Registration.expiryDate
)
</code></pre>
<p>Now we are ready to add Field Paths, the main building blocks used to construct Subgrounds queries. Field Path is a translation of the raw GraphQL schema starting from the root query entity down to the scalar leaf entity:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Select the latest 1000 registration names by registration datetime</span>
registrations = ens.Query.registrations(
    first=<span class="hljs-number">1000</span>, <span class="hljs-comment"># latest 1000 registrations</span>
    orderBy=ens.Registration.registrationDate, <span class="hljs-comment"># order registrations by time</span>
    orderDirection=<span class="hljs-string">"desc"</span> <span class="hljs-comment"># latest registration data will be first</span>
)

field_paths = [
    registrations.domain.name, <span class="hljs-comment"># ens domain like "vitalik.eth"</span>
    registrations.registrant.id, <span class="hljs-comment"># hexadecimal eth address</span>
    registrations.registrationdate, <span class="hljs-comment"># time in epoch format</span>
    registrations.cost, <span class="hljs-comment"># price for registration</span>
    registrations.expirydate <span class="hljs-comment"># expiry date of domain</span>
]
</code></pre>
<h3 id="heading-step-3-query-time">Step 3: Query Time!</h3>
<p>Now that we have the payload ready, let’s send the request for data and display the first five results:</p>
<pre><code class="lang-python"><span class="hljs-comment"># get data</span>
df = sg.query_df(field_paths)

<span class="hljs-comment"># print the first five results</span>
df.head()
</code></pre>
<p>The output will look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1662673616824/OPi35cqfB.png" alt="output.png" /></p>
<p>Boom! We got ourselves some interesting near real-time on-chain data!</p>
<h3 id="heading-step-4-perform-some-transformations-on-the-data">Step 4: Perform some Transformations on the Data</h3>
<p>Now that we have completed the Extract part of our ETL(<code>Extract</code>, <code>Transform</code>, and <code>Load</code>) process, we proceed to Transform. FYI, we won’t be performing the Load stage of ETL, but you can if you like.</p>
<p>The first item in our transformation is to convert the registrations_cost column values from <code>Wei</code>(smallest denomination of ether) to <code>ether</code>:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Convert `registrations_cost` column from wei to ether </span>
<span class="hljs-comment"># 1 ether = 1,000,000,000,000,000,000 wei (10^18)</span>
df[<span class="hljs-string">'registrations_cost'</span>] = df[<span class="hljs-string">'registrations_cost'</span>] / (<span class="hljs-number">10</span>**<span class="hljs-number">18</span>)
</code></pre>
<p>The next item would be to rename the columns for simplicity and standardization:</p>
<pre><code class="lang-python"><span class="hljs-comment"># rename columns for simplicity</span>
df = df.rename(columns={<span class="hljs-string">'registrations_domain_name'</span>: <span class="hljs-string">'ens_name'</span>,
                        <span class="hljs-string">'registrations_registrant_id'</span>: <span class="hljs-string">'owner_address'</span>,
                        <span class="hljs-string">'registrations_registrationdate'</span>: <span class="hljs-string">'registration_date'</span>,
                        <span class="hljs-string">'registrations_cost'</span>: <span class="hljs-string">'registration_cost_ether'</span>,
                        <span class="hljs-string">'registrations_expirydate'</span>: <span class="hljs-string">'expiry_date'</span>
                        })
<span class="hljs-comment"># inspect the changes in df</span>
df.head()
</code></pre>
<p>Final dataframe sample:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1663092197201/gT3UMEz1f.png" alt="DataFrame output" /></p>
<h2 id="heading-whats-next">What’s Next?</h2>
<p>If you’re interested in digging into the data to derive some fun insights, put on your data analyst monocle and dive into the data! You can interrogate the data to find the average registration cost over a period or create a dashboard to track trending topics extracted from registered names.</p>
<p>Share your hacks and learnings! You can find me on Twitter <a target="_blank" href="https://twitter.com/ynot_kip">@ynot_kip</a> if you have any questions or say hi!</p>
<h2 id="heading-github-repo">GitHub Repo</h2>
<ul>
<li><a target="_blank" href="https://github.com/tonykipkemboi/ENS_subgraph_data">ENS_subgraph_data</a></li>
</ul>
<h2 id="heading-more-resources">More Resources</h2>
<ul>
<li><p>The Graph:<br />  <a target="_blank" href="https://thegraph.com/docs/en/about/#how-the-graph-works">About The Graph and How The Graph Works</a> and <a target="_blank" href="https://thegraph.com/docs/en/querying/querying-best-practices/">Querying Best Practices</a></p>
</li>
<li><p>Subgrounds:<br />  <a target="_blank" href="https://playgrounds-analytics.gitbook.io/playgrounds-docs/subgrounds/tutorials/subgrounds-workshop">Subgrounds Workshop</a> and <a target="_blank" href="https://playgrounds-analytics.gitbook.io/playgrounds-docs/subgrounds/the-basics">Subgrounds Docs</a></p>
</li>
<li><p>GraphQL:<br />  <a target="_blank" href="https://graphql.org/learn/">Introduction to GraphQL</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[What is a Governance Token?]]></title><description><![CDATA[Centralized Governance
In the centralized world, governance is delegated to a select few individuals we trust to represent our values and advocate for change on our behalf.
A good example is how we elect officials to government office. 
The officials...]]></description><link>https://thedataengineerblog.com/what-is-a-governance-token</link><guid isPermaLink="true">https://thedataengineerblog.com/what-is-a-governance-token</guid><category><![CDATA[dao-governance]]></category><category><![CDATA[DAOs]]></category><category><![CDATA[token]]></category><category><![CDATA[Governance]]></category><category><![CDATA[governance-token]]></category><dc:creator><![CDATA[Tony Kipkemboi]]></dc:creator><pubDate>Thu, 04 Aug 2022 18:03:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/YOCDD-D4oOM/upload/v1659634275969/zsFL1gRcQ.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-centralized-governance">Centralized Governance</h3>
<p>In the <em>centralized world</em>, governance is delegated to a select few individuals we trust to represent our values and advocate for change on our behalf.</p>
<p>A good example is how we elect officials to government office. </p>
<p>The officials are our delegates. We trust them to represent our views and vote on matters that reflect our choices, but as we all know, this does not always end up the way we want it to. Big corporations often end up swaying the votes to fit a different agenda.</p>
<blockquote>
<p><em>governance = power</em></p>
</blockquote>
<h3 id="heading-decentralized-governance">Decentralized Governance</h3>
<p>Enter the <em>decentralized world</em> of <a target="_blank" href="https://www.investopedia.com/terms/b/blockchain.asp">blockchain</a>! </p>
<p>Blockchain protocols are run by code; code is the law in the decentralized world. Instructions and consequent behaviors are coded into the protocols, which automatically execute once specific parameters are met. </p>
<p>This differs from centralized governance, where a few individuals decide and execute changes affecting millions of people.</p>
<p>A good example of decentralized governance where everyone participates in the management of the organization is known as a <strong><a target="_blank" href="https://ethereum.org/en/dao/">Decentralized Autonomous Organization (DAO)</a></strong>. Many DAOs focus on almost all aspects of our centralized world; explore existing DAOs on the <a target="_blank" href="https://app.daohaus.club/explore">DAOhaus</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1659634957439/fNuZo7sPJ.png" alt="A sample of the 732 DAOs you can explore on DAOhaus" /></p>
<h3 id="heading-governance-tokens">Governance Tokens</h3>
<p>There are several models for DAO membership, but I will focus on <em>Token-Based Membership</em>. One means you can be a member of a DAO is by trading tokens in a decentralized exchange. </p>
<p>We will dive deeper into how governance tokens work in a DAO by using an example of a <strong><a target="_blank" href="https://en.wikipedia.org/wiki/Chama_(investment)#:~:text=A%20Chama%20is%20an%20informal,group%22%20or%20%22body%22.">Chama</a></strong>; and no, it's not the village in Rio Arriba County in New Mexico. Chama is a Swahili word and a common term in East Africa, especially in Kenya, where it means an informal cooperative society where a group of people pools their financial resources for investment purposes. The most common one is the Merry Go Round (Rotating Savings and Credit Associations), where a fixed amount of money is collected from every member periodically—usually monthly—and paid out to one of the members on a rotating schedule. </p>
<p>The members at the end of the payment rotation have the highest risk because those paid early in the schedule have no incentive to keep up with payments. The Rotating Savings and Credit Associations (ROSCA) will solve this issue by front-loading the most trustable members in the payment rotation and the least trustable in the end. </p>
<p>The ROSCA methodology is better than a random structure but still presents risks. A DAO could solve most of the trust issues in this case. </p>
<p>I propose <strong>ChamaDAO</strong> as a solution. To be a member, you mint a free Chama NFT that grants you access to a token-gated community website and Discord channel. The NFTs give you the right to vote on proposals and are later replaced by a <strong>$CHAMA</strong>—an ERC-20 token. In this example, you need a specific amount of  <strong>$CHAMA</strong> tokens to be a member and consequently have voting rights in the DAO. As a member, you can trade the Chama token on decentralized exchanges allowing new members to join the DAO.</p>
<p>The <strong>$CHAMA  is a governance token</strong> that represents each member's stake in ChamaDAO. By distributing control among members, we achieve <strong>on-chain governance</strong>. Each member, as a collective, holds power to change the DAO protocol; foundational code. Members have collective ownership and control over the DAO treasury. Suppose ChamaDAO members propose to buy real estate property in the suburbs of Nairobi, Kenya. In that case, each member has to vote <strong>for</strong> or <strong>against</strong> the proposal, and the voting result will determine if the contract will automatically execute the proposal. If the majority vote YES, then the code will execute accordingly to disburse funds to purchase the real estate property from the DAO treasury and vice versa.</p>
<h3 id="heading-takeaways">Takeaways</h3>
<p>Governance tokens present a new paradigm to community governance, but it is also too early to conclude. Blockchain technology is still very early, and there is a certain likelihood that it will change over the years to come as more people are onboarded, and governments curve out new policies.</p>
<p>Make sure to do <strong>your</strong> due diligence before proceeding with the investment in this new ecosystem.</p>
<h3 id="heading-article-sources">Article Sources</h3>
<ol>
<li>Investopedia. , "What Is a Blockchain<strong>?,</strong> <a target="_blank" href="https://www.investopedia.com/terms/b/blockchain.asp">https://www.investopedia.com/terms/b/blockchain.asp</a>" Accessed Mar. 9, 2022.</li>
<li>Ethereum. "Decentralized autonomous organizations (DAOs), <a target="_blank" href="https://ethereum.org/en/dao/">https://ethereum.org/en/dao/</a>" Accessed Mar. 9, 2022.</li>
<li>DAOhaus. "Explore DAOs, <a target="_blank" href="https://app.daohaus.club/explore">https://app.daohaus.club/explore</a>" Accessed Mar. 9, 2022.</li>
<li>Chama (investment). “Chama <a target="_blank" href="https://en.wikipedia.org/wiki/Chama_(investment)#:~:text=A%20Chama%20is%20an%20informal,group%22%20or%20%22body%22">https://en.wikipedia.org/wiki/Chama_(investment)#:~:text=A Chama is an informal,group" or "body"</a>.” Accessed Mar. 9, 2022.</li>
</ol>
]]></content:encoded></item></channel></rss>