View on GitHub

smart_ui

App that generates a JSON file containing the details of various UI components from the screenshot of a wireframe. [Submission for Smart UI (Techfest 2020-21)]

Smart UI (Team ID: SU-208042)

Problem Statement

Generate a JSON file (output.json) from the screenshot of the wireframe. The file should contain the details of various elements used in the webpage.

Link to Detailed Problem Statement


Our Solution

App Demo

Usage:

Clone the Repository:

> git clone https://github.com/tezansahu/smart_ui_tf20.git
> cd smart_ui_tf20/

Create & activate a virtual environment in Anaconda:

> conda create -n smartui python=3.7
> conda activate smartui

Note: The installation of dependenices & running of app works best in Anaconda with Python 3.7 & tensorflow=2.3.1

Install the dependencies:

(smartui)> conda install -c conda-forge --file requirements.txt
(smartui)> pip install keras-ocr

Note: Since keras-ocr is not found in any of the Anaconda channels, it has to be installed separately as mentioned above.

Download necessary models & repositories for running the app:

(smartui)> python models/download_models.py   # Download the models to be used by the app
(smartui)> git clone https://github.com/tesseract-ocr/tessdata.git ./app/tessdata/   # Clone the tessdata/ for legacy OCR being used in the app

Start the app:

(smartui)> cd app/
(smartui)> streamlit run app.py

This should start the app on localhost & fire up a tab in the browser


Overview:

We use a three stage pipeline to generate a JSON file containing the attributes of all the components in the input wireframe image. We then render the JSON output into an HTML file.

The pipeline, as shown in the figure, comprises of the following stages -

UI Component Detection & Classification

Taking inspiration from the paper Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?, we apply a hybrid approach (Traditional Image-Processing + Deep Learning) to identify the various UI Components from a Wireframe Image. The two stages of this process are described below.

Image-Processing Based UI Element Detection:

Given a wireframe image as input, we first segment out the regions containing probable UI components using Image Processing techniques including contour detection, etc. Details about the implementation can be found here.

Note: This portion of the code is an adapted version of the code found in UI Element Detection (UIED).

Deep Learning Models (for Classification):

The following DL Models (CNNs) have been developed to categorize the detected UI elements into specific classes:

  1. CNN Trained on RICO Dataset (Pretrained & used in UIED)
  2. CNN Trained on Wireframes Dataset (provided by organizers) (Transfer Learning using Model 1 as Base Model)
  3. CNN Trained on Generalized Dataset (Wireframes + ReDraw Dataset) (Transfer Learning using Model 2 as Base Model)

Details about the models, inculding their training, performance & downloadable weights can be found here.

Text & Attribute Extraction for Identified Components

For each of the identified components in the previous stage of the pipeline, its necessary attributes & styles are extracted, along with the text that the element may contain. These attributes include font & color related attributes. Details about the text recognition & attribute extracton process can be found here. The output is a JSON file containing the identified UI Elements along with their attributes (properties).

HTML Generation from JSON (Bonus)

The JSON file generated above is passed through the HTML generation pipeline to obtain a HTML file that renders the components identified from the wireframe image into a webpage. Details of this process can be found here.

Note: The HTML code generated may not be of the best quality, but can render the JSON with sufficient accuracy to relate it to the original image input.


Created with ❤️ by Rishabh Arya, Shreya Laddha & Tezan Sahu