Reducing Boilerplate on .NET IHost Applications

In this post, you’ll find how to reduce boilerplate on .NET IHost applications, enhance development speed, code readability and maintenance.

Hey, listen… Suppose you have a huge .NET solution with a bunch of Swagger projects or even pure MVC applications. If you look closer, you’ll realize that most of the code you wrote to initialize the applications is basically the same, it is known as boilerplate. Boilerplate code, while necessary, can lead to increased development time, reduced code maintainability, and potential errors. Fortunately, there are techniques to reduce boilerplate. In this post, I will show how to reduce boilerplate on .NET IHost applications, enhancing developer productivity and code quality.

This post will focus on dependency injection and extension methods however there are plenty of ways to reduce the boilerplate such as T4 templates or tools like AutoFixture (for testing), which may be explored in future posts.

Dependency Injection (DI)

Dependency Injection is a fundamental concept in modern software development and plays a crucial role in reducing boilerplate. By leveraging DI, you can inject services and components where needed, eliminating the need for explicit instantiation and reducing repetitive code. The .NET Core framework has a built-in DI container that makes this process really easy.

The next code shows a service class and how to use it on a controller.

You must register MyDearService, at Startup.cs or Program.cs, with services.AddScoped<MyDearService>() for the class to be available to be used in the controller.

public class MyDearService
{
    // Service implementation
    public async Task DoSomethingSpecial() {
        // special stuff here
    }
}

public class MyDearController : ControllerBase
{
    private readonly MyDearService _myDearService;

    public MyController(MyDearService myService)
    {
        _myDearService = myService;
    }

    [HttpGet]
    public async Task<IActionResponse> MyNiceMethod() {
        // Controller logic using MyDearService
        await _myDearService.DoSomethingSpecial();
    }
}

Extension Methods

To keep the Startup process clean and readable, encapsulate service registrations in extension methods. This not only reduces code size in the Startup process but also enhances readability and organization.

Example:

public static class ServiceCollectionExtensions
{
    public static IServiceCollection AddBootstrapServices(this IServiceCollection services)
    {
        services.AddScoped<MyDearService>();
        // You can add many services as you need
        return services;
    }
}

// Then you can call it in Startup.cs (if ufing .NET prior to v6)
public void ConfigureServices(IServiceCollection services)
{
    services.AddBootstrapServices();
    // Other configurations
}

// Or call it in the Program.cs (if ufing .NET after v6)
services.AddBootstrapServices();

Practical Example

As proposed at the beginning of this post, imagine a solution with many projects, each one being a swagger API. Now instead of configuring each one individually, let’s use a base project called Commons. The Commons project will contain all the jazz necessary to configure and boot the application.

Solution with two swagger APIs and a shared Commons project
Solution with two swagger APIs and a shared Commons project

The Commons project has a Boot class, with the needed code to run the application and auxiliary extension classes with methods to register services and configure the pipeline.

Of course, you’ll need to install the Swashbuckle package to use swagger-related methods such as AddSwaggerGen and UseSwaggerUI.

To install the package use the Visual Studio Nuget manager or run the following command in the Commons project folder:

dotnet add package Swashbuckle.AspNetCore

ServiceExtensions class

This extension class holds the code to register all the necessary services. As this is a simple example, I’m only registering the swagger-related services but it could also have some other services like database and remote services like complementary REST services.

public static class ServiceExtensions
{
    public static IServiceCollection RegisterSwaggerServices(this IServiceCollection services)
    {
        // Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
        services.AddEndpointsApiExplorer();
        services.AddSwaggerGen();

        return services;
    }
}

WebApplicationExtensions class

Here we should create all the methods needed to configure pipeline. Again, I’m only configuring swagger but one could add methods to configure and extend the pipeline to read request headers for instance.

public static class WebApplicationExtensions
{
    public static WebApplication ConfigureSwaggerPipeline(this WebApplication webApp)
    {
        // Configure the HTTP request pipeline.
        if (webApp.Environment.IsDevelopment())
        {
            webApp.UseSwagger();
            webApp.UseSwaggerUI();
        }
        return webApp;
    }
}

Boot class

This class has a single method to run the main application and call the extension methods builder.Services.RegisterSwaggerServices() and app.ConfigureSwaggerPipeline()

public static class Boot
{
    public static void Run(string[] args)
    {
        var builder = WebApplication.CreateBuilder(args);
        
        // Add services to the container.

        builder.Services.AddControllers();
        builder.Services.RegisterSwaggerServices();

        var app = builder.Build();

        app.ConfigureSwaggerPipeline();

        app.UseHttpsRedirection();
        app.UseAuthorization();
        app.MapControllers();
        app.Run();
    }
}

Program class

To run the SimpleApi1 and SimpleApi2 programs we have to write a single line of code and call the static method Run from the Boot class. Now we can see great improvement in speed and code reuse.

Boot.Run(args);

The programmer has to make changes only in the Commons project when a new service is needed by the APIs or when some adjustments must be made.

Conclusion

In the ever-evolving landscape of .NET development, reducing boilerplate code is crucial for clean, scalable and maintainable applications. Dependency Injection and simple Extension Methods are effective strategies to streamline .NET IHosted applications. By implementing these simple techniques, developers can focus more on business logic and innovation, improving the overall code quality and development speed.

Get the source: https://github.com/raffsalvetti/ReduceBoilerplate

Theming Java Swing Applications

Learn step-by-step in this post how to create and apply themes to your Java Swing applications without using third-party libraries.

Stay awhile and listen… Java Swing is a robust framework for creating cross-platform graphical user interfaces (GUIs). While it provides a native look and feel across different platforms, there are times when you might want to customize the appearance of your Swing application to match your brand or achieve a particular aesthetic. This is where theming Java swing applications comes into play. I will show how I themed a particular Java Swing application to give it a unique and personalized look.

The application I worked on I made back in 2002, maybe, when I was learning Java by myself and guess what, the code was a mess. It was intended to find Runewords for the game Diablo 2. A remastered version of Diablo 2 was released a couple of years ago (2021) and thus I “remastered” my old application to fit the game’s beauty.

The Basics

Before we dive into theming, it’s crucial to have a good understanding of how Java Swing works and how it prints components on screen. Swing uses a hierarchy of components, such as Frames, Panels, Layout Managers, Buttons, Labels, etc., to build the GUI. To theme a Swing application, you’ll need to work with various aspects like colors, background, fonts and borders to customize these components. Also could be necessary to create your own custom components at some point.

Look and Feel (L&F)

Swing applications can be themed using different Look and Feel libraries. Two popular ones are:

  • Metal: The default Swing L&F that provides a consistent look across different platforms.
  • Nimbus: A more modern and customizable L&F with support for theming.

You can choose the L&F that suits your application’s style and requirements. More of that can be found here.

I created my custom components to match Diablo 2 look and feel, so I did not use any of these libraries.

The Original Application

I built this application to find a valid Runeword with runes the player has in its inventory. It has a filter panel with selectable runes (checkboxes) and a table showing results, if any, ordered by match count. That is, if the player has checked RuneA, RuneB and RuneC at the filter panel and a valid Runeword is “RuneC RuneA” the whole combination will appear first.

The old application with the default theme
The original look and feel (running on a 2023 Java VM)

Although simple, this was a neat application back then. There weren’t few websites with this kind of information and even if the information was there you’d have to search the valid runewords filtering by yourself.

New Features

Some new features that I included in this update:

  • Item type filter: a panel with item types to be selected and filtered;
  • Status bar: a bottom panel with information;
  • Match indication: selected runes and items will change color to green when matched and red when not matched;
  • Auto sizing to columns: the column adapts the size to contain text;
  • JSON database: database changed from XML to JSON file;

Customize the Theme

Inspiration

I got some in-game screen captures to guide the theming. With ambient in mind, I started to search for textures and colors to match the game.

In-game screen capture showing the player's stash and inventory
Diablo 2 Player’s stash and inventory

The new game modified the inventory panel and added some new menus.

In-game screen capture showing the Gameplay menu and options
Diablo 2 Game configurations

The main goals are to mimic the fonts, colors and components like buttons and checkboxes which have unique aesthetics. Elements like the background color, the dark atmosphere and the stone-like texture of the menu and its borders are also important but can be adapted.

Custom Components

We need some custom components to create a menu like the game.

  • Checkbox: a themed checkbox using the image from the game;
  • Fake transparent panel: a JPanel that simulates transparency;
  • Fake transparent label: a JLabel that simulates transparency;
  • Table header render: a table cell render just for rendering the headers;
  • Table cell render: a table cell render that handles the color and format of each cell;
  • Table model: a table model to hold the Runeword recipe;

Transparency

At first, I thought that setting the property opaque to false would turn a JPanel into a transparent panel, well, it’s not always true. A small but annoying glitch happens when the panel is over another panel that is transparent too and some shadows appear around components like labels and checkboxes when in focus or its state changes.

To fix that problem I used a base panel with a background image and every time I had a need for a “transparent” component over it, I just used the paintComponent method to copy and paste the area from the background image to the background of the component making it “transparent”, as I’ll show next.

Textured JPanel

This is a simple component that paints a picture as a background. I used this panel as a container for other elements.

public class TexturedJPanel extends JPanel {
    private Image bg;

    public TexturedJPanel(String bgPath) {
        try {
            bg = ImageIO.read(new File(bgPath));
        } catch (Exception ex) {

        }
    }

    public TexturedJPanel(Image bg) {
        this.bg = bg;
    }

    @Override
    protected void paintComponent(Graphics g) {
        if(bg == null) return;
        g.drawImage(bg, 0,0, this);
    }
}
Texture panel with a neat "stoned" background
Texture panel with a neat “stoned” background

Transparent JPanel

This is a simple custom component that extends its functionalities from the JPanel component except for the paintComponent method which simulates the transparency effect.

public class FakeTransparentPanel extends JPanel {

    private final ResourceLoaderComponent resourceLoader = ResourceLoaderComponent.getInstance();

    @Override
    protected void paintComponent(Graphics g) {
        var g2 = (Graphics2D)g;
        g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);

        // drawing the background
        g2.drawImage(
                resourceLoader.defaultBackgroundTexture,
                0,
                0,
                getWidth(),
                getHeight(),
                getX(),
                getY(),
                getX() + getWidth(),
                getY() + getHeight(),
                this);

        // filling alpha rectangle
        g2.setColor(resourceLoader.colorShadow);
        g2.fillRect(0, 0, getWidth(), getHeight());
    }
}

The method paintComponent copy a rectangle of the same size as the component from the image, used as the background of the container panel, and paste as the background image of the checkbox component that is being drawn. Then, another rectangle filled with a translucid black color (alpha channel less than 1) is printed over the component bringing some darkness effect.

In this particular case, getX(), getY(), getX() + getWidth() and getY() + getHeight() are the dimensions for the source image and 0, 0, getWidth() and getHeight() are the dimensions for the component itself. This is kind of confusing because the source comes after the destination and may trick you somehow.

Transparent panel with border and title
Transparent panel with border and title

Themed JLabel

This component has the same construction as the previous one but also prints the label text.

public class FakeTransparentLabel extends JComponent {
    private final ResourceLoaderComponent resourceLoader = ResourceLoaderComponent.getInstance();

    //text label
    private String label;

    private final FontMetrics fontMetrics;

    private double labelLength;

    public FakeTransparentLabel() {
        super();
        fontMetrics = getFontMetrics(resourceLoader.defaultFont);
    }

    public String getLabel() {
        return label;
    }

    public void setLabel(String label) {
        this.label = label;
        if (this.label != null && !this.label.isEmpty()) {
            var c = new Canvas();
            var lm = fontMetrics.getStringBounds(label, c.getGraphics());
            labelLength = lm.getWidth();
            setPreferredSize(new Dimension((int) lm.getWidth(), fontMetrics.getHeight()));
        }
    }

    @Override
    protected void paintComponent(Graphics g) {
        var g2 = (Graphics2D)g;
        g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
        g2.drawImage(
                resourceLoader.defaultBackgroundTexture,
                0,
                0,
                getWidth(),
                getHeight(),
                getX(),
                getY(),
                getX() + getWidth(),
                getY() + getHeight(),
                this);
        g2.setColor(resourceLoader.colorShadow);
        g2.fillRect(0, 0, getWidth(), getHeight());

        g.setColor(getForeground());
        g2.setFont(resourceLoader.defaultFont);
        g2.drawString(label, getWidth() / 2 - (int)labelLength / 2, getHeight() / 2 + fontMetrics.getHeight() / 4);
    }
}

Note that the method setLabel sets the text but also sets some metrics for drawing the text.

The method paintComponent is almost the same as the previous component except that it prints the string calling g2.drawString.

Transparent Label in red
Transparent Label in red

Themed JCheckbox

I would say that after the fonts, colors and buttons, the checkbox is the most iconic interface item.

In order to make a checkbox component with the same characteristics I used gimp to copy the images that represent the checked and unchecked states and then created an empty class called MyCheckbox and extended it from the JComponent. This class is almost a mixture of the other two classes with an extra print for the image of the checkbox according to its checked/not checked state.

//width of the image
private final int SPRITE_W = 22;

//height of the image
private final int SPRITE_H = 21;

//little gap between the image and text
private final int LABEL_GAP = 5;

//the text label
private String label;

//whether is checked or not
private boolean selected;

This class is very common except for the following methods:

public void setLabel(String label) {
    this.label = label;
    if (this.label != null && !this.label.isEmpty()) {
        var c = new Canvas();
        var lm = fontMetrics.getStringBounds(label, c.getGraphics());
        setPreferredSize(new Dimension(SPRITE_W + LABEL_GAP + (int) lm.getWidth(), Math.max(SPRITE_H, fontMetrics.getHeight())));
    }
}

@Override
protected void paintComponent(Graphics g) {
    var g2 = (Graphics2D) g;
    g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);

    g2.drawImage(
            resourceLoader.defaultBackgroundTexture,
            0,
            0,
            getWidth(),
            getHeight(),
            getX(),
            getY(),
            getX() + getWidth(),
            getY() + getHeight(),
            this);

    g2.setColor(resourceLoader.colorShadow);
    g2.fillRect(0, 0, getWidth(), getHeight());

    if (selected) {
        g2.drawImage(resourceLoader.checked, 0, 0, null);
    } else {
        g2.drawImage(resourceLoader.unchecked, 0, 0, null);
    }

    g.setColor(getForeground());
    g2.setFont(resourceLoader.defaultFont);
    g2.drawString(label, minimumDimension.width + LABEL_GAP, getHeight() / 2 + fontMetrics.getHeight() / 4);
}

The method setLabel is used to calculate the final size of the checkbox label/text, using the defined font and paintComponent draw the component on the screen.

Transparent checkbox in yellowish color
Transparent checkbox in yellowish color

The checkbox is glitchy in this example but it shows the transparency effect.

Themed JScrollBar

The scrollbar was divided into two parts. The first one is very simple, it is just a class extending JScrollBar and setting the UI. The second part is the UI itself.

public class MyScrollBar extends JScrollBar {
    public MyScrollBar() {
        setUI(new MyScrollBarUI());
    }

    public MyScrollBar(int orientation) {
        this();
        setOrientation(orientation);
    }
}

The UI is more complex because the component is painted like the others I have shown here.

Here are the two most important pieces of code, a method to paint the thumb, called paintThumb and another to paint the thumb track or bar called paintTrack.

@Override
protected void paintTrack(Graphics g, JComponent c, Rectangle r) {
    Graphics2D g2 = (Graphics2D) g;
    g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);

    g2.drawImage(
            resourceLoader.defaultBackgroundTexture, //img - the specified image to be drawn. This method does nothing if img is null.
            0,
            BUTTON_SIZE, 
            r.width, 
            BUTTON_SIZE + r.height,
            r.x,
            r.y,
            r.x + r.width,
            r.y + r.height,            null);

    g2.setColor(resourceLoader.colorShadow);
    g2.fillRect(0, BUTTON_SIZE, r.width, BUTTON_SIZE + r.height);
}

@Override
protected void paintThumb(Graphics g, JComponent c, Rectangle r) {
    Graphics2D g2 = (Graphics2D) g;
    g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
    g2.drawImage(resourceLoader.scrollThumb, r.x, r.y, null);
}

This class also implements the scrollbar buttons on a private class. There’s not much to see here except for the paintComponent method. Also, it’s worth mentioning that you should always set the preferred size of a component otherwise it will not be painted.

There is just one image (the arrow pointing up) for the scroll buttons. Note that I applied a transformation to get a vertically flipped image (the arrow pointing down). If this scrollbar is used for horizontal scroll, SwingConstants.EAST and SwingConstants.WEST should be implemented by applying the correct transformation.

Besides that, the method is pretty much the same as the other classes.

Here is the paintComponent method for the scroll buttons.

@Override
protected void paintComponent(Graphics g) {
    Graphics2D g2 = (Graphics2D) g;
    g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);

    AffineTransform t;
    var image = new BufferedImage(
            resourceLoader.scrollButton.getWidth(null),
            resourceLoader.scrollButton.getHeight(null),
            BufferedImage.TYPE_INT_ARGB);

    var ig = image.createGraphics();

    ig.drawImage(resourceLoader.scrollButton, 0, 0, null);
    ig.dispose();

    switch (orientation) {
        case SwingConstants.SOUTH:
            t = AffineTransform.getScaleInstance(-1, -1);
            t.translate(-image.getWidth(null), -image.getHeight(null));
            image = new AffineTransformOp(t, AffineTransformOp.TYPE_NEAREST_NEIGHBOR).filter(image, null);
            break;
        default:
            break;
    }

    g2.drawImage(
            resourceLoader.defaultBackgroundTexture, 
            0,
            0,
            getWidth(),
            getHeight(),
            getX(),
            getY(),
            getX() + getWidth(),
            getY() + getHeight(),
            null);

    g2.setColor(resourceLoader.colorShadow);
    g2.fillRect(0, 0, getWidth(), getHeight());

    g2.drawImage(image, 0, 0, null);
}
Transparent Scrollbar

Conclusion

The new look of the application.

Custom theming in Java Swing applications can significantly enhance their visual appeal and user experience. By working an appropriate Look and Feel, and messing around with colors, fonts, background and borders, you can create a unique GUI that aligns with the application’s desired aesthetics.

Get the sources: https://github.com/raffsalvetti/DiabloRuneWordItemFinder

See ya!

Web Scraping User Reviews from Amazon.com

Learn in this post how to scrape user reviews from the Amazon.com website using .NET 6 and Playwright step-by-step.

Pssst… over here… In the world of e-commerce, user reviews play a crucial role in influencing other people’s purchasing decisions. Besides that, we can harness this data to possibly train some neural network to classify reviews or even create pseudo reviews for a product. Using tools like Playwright and .NET Core, we can create a web scraper to collect these review data. In this blog post, I’ll show how to extract data from user reviews of books on Amazon.com.

Setting Up Your Environment

Before diving into the procedure, make sure you have .NET Core 6 SDK installed. You can download it from the official .NET website (I’m using Ubuntu Linux. If you are too, check the instructions here). After that, create a new C# console application project using your favorite IDE or by using the command line. I’ll use Visual Studio Code and the .NET command line tool.

mkdir AmazonScraper && cd AmazonScraper && dotnet new console --use-program-main

This command will create a directory called AmazonScraper, cd into it and create a new .NET console application with the old program style (with the main method).

Installing Playwright

Playwright is an automation library like Selenum or Puppeteer that allows you to control web browsers programmatically. You can install the Playwright NuGet package using the following command in your project directory:

dotnet add package Microsoft.Playwright

This will install Playwright and its dependencies into your project. You can run Playwright with Chrome/Chromium or Firefox browser that you have installed on your system but it is also possible to install an “embedded” browser to ship it with your program. If it rings a bell, check how to install it here.

First Test, Navigate to Amazon.com

This program can be divided into two main functions: navigation and exporting data. As you may have guessed, the Navigation function navigates through pages and collects all data needed into a Review list. Also, this method saves the current state of the execution. The Export method does precisely that, it exports data as a JSON file.

As mentioned before, I’m using the system-installed Chromium browser just by defining the ExecutablePath property of BrowserTypeLaunchOptions and passing it to the method LaunchAsync. It is worth mentioning that if you want to run the browser without rendering the main window, you have to set Headless = true (or by not defining the Headless property, once Headless = true is the default option):

namespace AmazonScraper;

using Microsoft.Playwright;
using System.Threading.Tasks;

public class Program {

    public static async Task Navigate() {
        using var playwright = await Playwright.CreateAsync();
        await using var browser = await playwright.Chromium.LaunchAsync(
            /*
                this is important if you did not installed the "embedded"
                browser. here I'am pointing the installed location of 
                chromium
            */
            new BrowserTypeLaunchOptions()
            {
                /*
                    Headless = true if you want to run the browser 
                    without rendring the main window
                */
                Headless = false,
                /*
                    system browser executable path
                */
                ExecutablePath = "/snap/bin/chromium" 
            }
        );

        var page = await browser.NewPageAsync();
        await page.GotoAsync("https://www.amazon.com.br/s?k=livros");
    }

    public static async Task Export() {
        
    }

    public static async Task Main(string[] args) {
        await Navigate();
        await Export();
    }
}

If you run this code, you should see a browser window and it will navigate automatically to the Amazon books page.

The goal is to scrape as many reviews as possible so I built an automatic navigation to recognize all the book elements in the root category, and then navigate to its review page.

Playwright provides a test generator that can generate C# code while you navigate manually through the site. The generator will recognize mouse clicks, the text you put on some textbox, read the XPath/Selector for each element and build all the jazz for you. You can find more about this kind of profanity here.

I went for the funny way and panned the XPath/Selectors for each element using the browser dev tools/console.

Review Object

This class has properties to identify the review.

public class Review
{
    /// <summary>
    /// review unique id
    /// </summary>
    public string Id { get; set; }

    /// <summary>
    /// product unique id
    /// </summary>
    public string ProductId { get; set; }

    /// <summary>
    /// review title
    /// </summary>
    public string Title { get; set; }

    /// <summary>
    /// review rating 1 to 5 stars
    /// </summary>
    public decimal Rating { get; set; }

    /// <summary>
    /// review body
    /// </summary>
    public string Comment { get; set; }
}

Execution State and Configuration Object

I created the CurrentState object to give some flexibility, track how is going the process and add some failure handling. It has properties like current URL, next URL, max pages to navigate, etc.

public class CurrentState
{
    /// <summary>
    /// current product url
    /// </summary>
    public string ProductsUrl { get; set; }

    /// <summary>
    /// all product ids on the current page
    /// </summary>
    public List<string> ProductList { get; set; }

    /// <summary>
    /// the product id that is currently being processed
    /// </summary>
    public string CurrentProduct { get; set; }

    /// <summary>
    /// the next product page url
    /// </summary>
    public string NextUrl { get; set; }

    /// <summary>
    /// the maximum number of pages to be read
    /// </summary>
    public int MaxPages { get; set; }

    /// <summary>
    /// delay time in seconds between url navigations
    /// </summary>
    public int Delay { get; set; }

    /// <summary>
    /// the number of the page that is currently being processed
    /// </summary>
    public int CurrentPage { get; set; }

    /// <summary>
    /// the review list
    /// </summary>
    public List<Review> Reviews { get; set; }

    /// <summary>
    /// amazon store language (pt-BR, en-GB, en-US...)
    /// </summary>
    public string StoreLanguage { get; set; }

    /// <summary>
    /// amazon product review base url
    /// </summary>
    public string ProductReviewBaseUrl { get; set; }

    /// <summary>
    /// amazon product base url
    /// </summary>
    public string AmazonBaseUrl { get; set; }
}

Also, I added some functions and variables to the file Program.cs to load and save the state.

Define a static variable to hold the name of the state:

private const string CurrentStateFileName = "current_state.json";

Initialize the CurrentState object with default values.

private static CurrentState currentState = new()
{
    MaxPages = 1,
    AmazonBaseUrl = "https://www.amazon.com.br",
    ProductReviewBaseUrl = "https://www.amazon.com.br/product-reviews",
    ProductsUrl = "https://www.amazon.com.br/s?k=livros",
    Delay = 5,
    ProductList = new List<string>(),
    Reviews = new List<Review>(),
    StoreLanguage = "pt-BR"
};

And write the functions to read and write the current state.

private async Task SaveCurrentState()
{
    Console.WriteLine($"Save Current State: {currentState}");
    await File.WriteAllTextAsync(CurrentStateFileName, JsonConvert.SerializeObject(currentState));
}

private async Task LoadCurrentState()
{
    if (!File.Exists(CurrentStateFileName)) return;
    Console.WriteLine($"Load Current State: {currentState}");
    var tFile = await File.ReadAllTextAsync(CurrentStateFileName);
    currentState = JsonConvert.DeserializeObject<CurrentState>(tFile);
}

I used Newtonsoft Json to handle the serialization and deserialization of objects. Add it to the project by running:

dotnet add package Newtonsoft.Json

Improving Navigation

Once the book page is loaded, use Playwright methods like QuerySelectorAllAsync, QuerySelectorAsync, GetAttributeAsync and InnerTextAsync to interact with page elements and extract the relevant information.

Here is an example:

var productId = await productContainer.GetAttributeAsync("data-asin");

This line of code is getting the product ID from a data attribute defined on an HTML tag, from a div in this particular case.

The navigation procedure is straightforward forward and it follows this diagram.

At the start point “next URL” is the product URL defined via the current state JSON file or at the CurrentState object initialization, if there is no previous JSON file.

Navigation method flow diagram
Flow diagram for the navigation function

Is worth mentioning that each URL navigation is preceded by a delay with a bit of time drift, a small wait to simulate a random person navigating through products and reviews. I do not know if Amazon will block requests or ask for a captcha resolution but keep in mind this kind of trickery.

The histogram mentioned on the flow diagram is a star gauge widget that groups reviews by ratings. Like the one below.

Example of a review histogram
Review histogram

Data Filtering and Exporting

At this point, you should have a pretty neat collection with hundreds of reviews yet, filtering and manipulating this data is made easy by using Linq.

First of all, let’s exclude duplicates if any.

var reviews = currentState.Reviews
    .GroupBy(x => x.Id)
    .Select(x => x.First()) //exclude duplicates
    .OrderBy(x => x.Rating)
    .ToList();

Now let’s remove double spaces, new lines and some special characters that appeared somehow. Also, I changed the ratings according to my needs.

I’ll use this data to train an AI to categorize short sentences into sentiments that could be negative, neutral or positive. So to achieve that goal, I translated the ratings from a five-level (five stars) sentiment to a three-level one. That is, ratings less than three are now recognized as negative and with a value of zero, ratings equal to three are now recognized as neutral and with a value of 1 and the rest are recognized as positive and with a value of 2.

reviews.ForEach(r =>
{
    if (Regex.IsMatch(r.Comment, @"\s{2,}"))
    {
        r.Comment = Regex.Replace(r.Comment, @"\s{2,}", " ", RegexOptions.Multiline);
    }
    if (Regex.IsMatch(r.Comment, @"\xA0"))
    {
        r.Comment = Regex.Replace(r.Comment, @"\xA0", "", RegexOptions.Multiline);
    }
    if (Regex.IsMatch(r.Comment, @"\n"))
    {
        r.Comment = Regex.Replace(r.Comment, @"\n+", "", RegexOptions.Multiline);
    }
    if(r.Rating < 3) r.Rating = 0;
    if(r.Rating == 3) r.Rating = 1;
    if(r.Rating > 3) r.Rating = 2;
});

After that, I saved it to a JSON file with a sample list containing the same number of each one of the sentiments.

Conclusion

As you extract reviews, you can store them in a suitable data structure or save them to a database. Afterward, you can use the reviews to play with sentiment analysis or maybe gain insights into the book’s reception.

In conclusion, web scraping user reviews of books on Amazon opens up a world of possibilities for extracting valuable information. From setting up the environment to navigating web pages, extracting reviews, and analyzing the data, this guide has introduced you to the fundamentals of the process.

Note: Remember to adhere to ethical guidelines while scraping data from websites.

Get the sources: https://github.com/raffsalvetti/AmazonScraper

Until the next one!