Changes

DSL Encoding (view source)

Revision as of 18:57, 3 May 2019

2,533 bytes added , 18:57, 3 May 2019

no edit summary

The current scripts that I wrote by following pix2code source code are living on

E:/projects/embedding

So far, I have been experimenting with only one simple DSL file, which is '00CDC9A8-3D73-4291-90EF-49178E408797.gui'. To see the current output (not yet one-hot), write

python convert_gui.py

What we just did is opening a DSL file, going through every single line, stripping some symbols and store all the tokens in a list. The ''tokens'' variable now looks something like this

tokens

['header ',

'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive',

''

]

Now, based on this list, to see the total number of tokens we can do

chars = sorted(list(set(tokens)))

which results in

['',

'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive',

'header ',

'quadruple ',

'row ',

'single ',

'small-title, text, btn-green',

'small-title, text, btn-orange',

'small-title, text, btn-red']

As we can see, we have 9 elements in this example, which means the length of each vector would be 9. Now, we need to assign a number for each of the symbol, and the number will indicate the index of that element in the vector.

char_indices = dict((c, i) for i, c in enumerate(chars))

indices_char = dict((i, c) for i, c in enumerate(chars))

This results in

char_indices

{'': 0,

'btn-inactive, btn-active, btn-inactive, btn-inactive, btn-inactive': 1,

'header ': 2,

'quadruple ': 3,

'row ': 4,

'single ': 5,

'small-title, text, btn-green': 6,

'small-title, text, btn-orange': 7,

'small-title, text, btn-red': 8}

Hence, if we have a line with token 'header', the one-hot representation of it is [0,0,1,0,0,0,0,0,0]. There is a '1' at index 3, which indicates that 3 is there.

Now, let's apply this embedding rule to our GUI file

sentences=[]

for i in range(0, len(tokens)):

sentences.append(tokens[i])

one_hot_vector = np.zeros((len(sentences),len(chars)))

for i, sentence in enumerate(sentences):

for t, char in enumerate(sentences):

one_hot_vector[t, char_indices[char]] = 1

The vector that represents our GUI will be something like this.

array([[0., 0., 1., 0., 0., 0., 0., 0., 0.],

[0., 1., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 1., 0., 0., 0., 0.],

[0., 0., 0., 1., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 0., 1., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 1., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 0., 0., 1.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 1., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 1., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 1., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 0., 1., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 1., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 1., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 1., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0.]])

Hiep

82

edits

Changes

DSL Encoding (view source)

Revision as of 18:57, 3 May 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools